29 August 2018

Widespread Architectural Change Part 1

Last month I introduced Four Categories of Architectural Refactoring:
  • Substitute Architectural Decision
  • Refactor Architectural Structure
  • Widespread Architectural Change
  • Other Architectural Changes
Today I want to start discussing options to perform Widespread Architectural Changes. This category contains all kind of small changes repeated throughout the whole code base. The number of these changes is high, being hundreds or sometimes even thousands of occurrences that have to be changed in a similar way. Examples of such modifications are:
  • Migrating or upgrading (similar) APIs, frameworks or libraries
  • Changing or unifying coding conventions
  • Fixing compiler warnings and removing technical debt
  • Consistently applying or changing aspects like logging or security
  • Applying internationalization
  • Migrate between languages, e.g. SQL vs. HQL
Ways to do Widespread Architectural Changes (C) by SoftDevGang 2016All of them do not change the overall structure, they just work on the implementation. Often substituting architectural decisions or altering architectural structure contain similar changes.

Options
I have been using some techniques since 2004 because as Code Cop I value code consistency. Two years ago I had the opportunity to discuss the topic with Alexandru Bolboaca and other experienced developers during a small unconference and we came up with even more options, as shown in the picture on the right. The goal of this and the next articles is to introduce you to these options, the power of each one and the cost of using it. A word of warning: This is a raw list. I have used some but not all of them and I have yet to explore many options in more detail.

Supporting Manual Changes With Fast Navigation
The main challenge of widespread changes is the high number of occurrences. If the change itself is small, finding all occurrences and navigating to them is a major effort. Support for fast navigation would make things easier. The most basic form of this is to modify or delete something and see all resulting compile errors. Of course this only works with static languages. In Eclipse this works very well because Eclipse is compiling the code all the time and you get red markers just in time. Often it is possible to create a list of all lines that need to be changed. In the past have created custom rules of static analysis tools like PMD to find all places I needed to change. Such a list can be used to open the file and jump to the proper line, one source file after another.

As example, here is a Ruby script that converts pmd.xml which is contains the PMD violations from the Apache Maven PMD Plugin to Java stack traces suitable for Eclipse.
#! ruby
require 'rexml/document'

xml_doc = REXML::Document.new(File.new('./target/site/pmd.xml'))
xml_doc.elements.each('pmd/file/violation') do |violation|
  if violation.attributes['class']
    class_name = violation.attributes['package'] + '.' +
                 violation.attributes['class']
    short_name = violation.attributes['class'].sub(/\$.*$/, '')
    line_number = violation.attributes['beginline']
    rule = violation.attributes['rule']

    puts "#{class_name}.m(#{short_name}.java:#{line_number}) #{rule}"
  end
end
The script uses an XML parser to extract class names and line numbers from the violation report. After pasting the output into Eclipse's Stacktrace Console, you can click each line one by one and Eclipse opens each file in the editor with the cursor in the proper line. This is pretty neat. Another way is to script Vim to open each file and navigate to the proper line using vi +<line number> <file name>. Many editors support similar navigation with the pattern <file name>:<line number>.

Enabling fast navigation is a big help. The power this option is high (that is 4 out of 5 on my personal, totally subjective scale) and the effort to find the needed lines and script them might be medium.

Search and Replace (Across File System)
The most straight forward way to change similar code is by Search and Replace. Many tools offer to search and replace across the whole project, allowing to change many files at once. Even converting basic scenarios, which sometimes cover up to 80%, helps a lot. Unfortunately basic search is very limited. I rate its power low and the effort to use it is also low.

Scripted Search and Replace using Regular Expressions
Now Regular Expressions are much more powerful than basic search. Many editors allow Regular Expressions in Search and Replace. I recommend creating a little script. The traditional approach would be Bash with sed and awk but I have used Ruby and Python (or even Perl) to automate lots of different changes. While the script adds extra work to traverse all the source directories and files, the extra flexibility is needed for conditional logic, e.g. adding an import to a new class if it was not imported before. Also in a script Regular Expressions can be nested, i.e. analysing the match of an expression further in a second step. This helps to keep the expressions simple.

For example I used a script to migrate Java's clone() methods from version 1.4 to 5. Java 5 offers covariant return types which can be used for clone(), removing the cast from client code. The following Ruby snippet is called with the name of the class and the full Java source as string:
shortName = shortClassNameFromClassName(className)
if source =~ / Object clone\(\)/
  # use covariant return type for method signature
  source = $` + " #{shortName} clone()" + $'
end
if source =~ /return super\.clone\(\);/
  # add cast to make code compile again
  source = $` + "return (#{shortName}) super.clone();" + $'
end
In the code base where I applied this widespread change, it fixed 90% of all occurrences as clone methods did not do anything else. It also created some broken code which I reverted. I always review automated changes, even large numbers, as jumping from diff to diff is pretty fast with modern tooling. Scripts using Regular Expressions helped me a lot in the past and I rate their power to high. Creating them is some effort, e.g. a medium amount of work.

Macros and Scripts
Many tools like Vim, Emacs, Visual Studio, Notepad++ and IntelliJ IDEA allow creation or recording Keyboard and or mouse macros or Application scripts. With them, frequently used or repetitive sequences of keystrokes and mouse movements can be automated, and that is exactly what we want to do. When using Macros the approach is the opposite as for navigation markers: We find the place of the needed change manually and let the Macro do its magic.

Most people I have talked to know Macros and used them earlier (e.g. 20 years ago) but not recently in modern IDEs. Alex has used Vim Scripts to automate repetitive coding tasks. I have not used it and relating to his experience. Here is a Vim script function which extracts a variable, taken from Gary Bernhardt's dotfiles:
function! ExtractVariable()
  let name = input("Variable name: ")
  if name == ''
    return
  endif

  " Enter visual mode
  normal! gv
  " Replace selected text with the variable name
  exec "normal c" . name
  " Define the variable on the line above
  exec "normal! O" . name . " = "
  " Paste the original selected text to be the variable value
  normal! $p
endfunction
I guess large macros will not be very readable and hard to change but they will be able to do everything what Vim can do, which is everything. ;-) They are powerful and easy to create - as soon as you know Vim script of course.

Structural Search and Replace
IntelliJ IDEA offers Structural Search and Replace which performs search and replace across the whole project, taking advantage of IntelliJ IDEA's awareness of the syntax and code structure of the supported languages.. In other words it is Search and Replace on the Abstract Syntax Tree (AST). Additionally it is possible to apply semantic conditions to the search, for example locate the symbols that are read or written to. It is available for Java and C# (Resharper) and probably in other JetBrains products as well. I have not used it. People who use it tell me that it is useful and easy to use - or maybe not that easy to use. Some people say it is too complicated and they can not make it work.

Online help says that you can apply constraints described as Groovy scripts and make use of IntelliJ IDEA PSI (Program Structure Interface) API for the used programming language. As PSI is the IntelliJ version of the AST, this approach is very powerful but you need to work the PSI/AST which is (by its nature) complicated. This requires a higher effort.

To be continued
And there are many more options to be explored, e.g. scripting code changes inside IDEs or using Refactoring APIs as well as advanced tools outside of IDEs. I will continue my list in the next part. Stay tuned.

15 August 2018

Creating your own NATstyle rules

Last month I showed how to use NATstyle from the command line. NATstyle is the utility to define and check the coding standard of your NATURAL program. Today I want to explain how to customize and create your own rules beyond what is explained in the manual. I used NaturalONE 8.3.5.0.242 CE (November 2016). Likely there are more options and rules available in newer versions of NaturalONE and NATstyle.

Basic Configuration
NATstyle comes packaged inside NaturalONE, the Eclipse-based IDE for NATURAL. As expected NATstyle can be configured in Eclipse preferences. The configuration is saved as NATstyle.xml which is used when you run NATstyle from the right click popup menu. We will need to modify NATstyle.xml later, so let's have a look at it:
<?xml version="1.0" encoding="utf-8"?>
<naturalStyleCheck version="1.0"
                   xmlns="http://softwareag.com/natstyle/rules"
                   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                   xsi:schemaLocation="http://softwareag.com/natstyle/rules checks.xsd">
  <checks type="source">
    <check class="CheckLineLength" name="Line length" severity="warning">
      <property name="max" value="72" />
      <property name="exclude" value="D3" />
    </check>
    <!-- more checks of type source -->
  </checks>
  <!-- more checks of other types -->
</naturalStyleCheck>
(The default configuration file is here together with its XML schema.)

Existing Rules
The existing rules are described in NaturalONE help topic Overview of NATstyle Rules, Error Messages and Solutions. Version 8.3 has 42 rules. These are only a few compared to PMD or SonarQube, which has more than 1000 rules available for Java. Here are some examples what NATstyle can do:
  • Source Checks: e.g. limit line length, find tab characters, find empty lines, limit the number of source lines and check a regular expressions for single source lines or whole source file.
  • Source Header Checks: e.g. force header or check file naming convention.
  • Parser Checks: e.g. find unused local variables, warn if local variable shadows view, find TODO comments, calculate Cyclomatic and NPath complexity, force NATdoc (documentation) tags and check function, subroutine and class names against regular expressions.
  • Error (Message File) Checks: e.g. check error messages file name.
  • Resource (File) Checks: e.g. check resource file name.
  • Library (Folder) Checks: e.g. library folder conventions, find special folders, force group folders and warn on missing NATdoc library documentation.
Same rule multiple times configured differently
Some rules like Source/Regular expression for single source lines only allow a single regular expression to be configured. Using alternation, e.g. a|b|c, in the expression is a way to overcome that, but the expression gets complicated quickly. Another way is to duplicate the <check> element in the NATstyle.xml configuration. Assume we do not only forbid PRINT statements, we also do not allow reduction to zero. (These rules do not make any sense, they are just here to explain the idea.) The relevant part of NATstyle.xml looks like
<checks type="source">
  <check class="CheckRegExLine"
         name="Regular expression for single source lines" ... >
    <property name="regex" value="PRINT '.*" />
  </check>
  <check class="CheckRegExLine"
         name="Regular expression for single source lines" ... >
    <property name="regex" value="REDUCE .* TO 0" />
  </check>
</checks>
While it is impossible to configure these rules in the NaturalONE preferences, it might be possible to run NATstyle with these modified settings. I did not verify that. I execute NATstyle from the command line passing in the configuration file name using the -c flag. (See the full configuration and script to to run the rules from the command line.)

rock piles of different sizeDefining your own rules
There is no documented way to create new rules for NATstyle. All rules' classes are defined inside the NATstyle plugin. The configuration XML contains a class attribute, which is a short name, e.g. CheckRegExLine. Its implementation is located in the package com.​softwareag.​naturalone.​natural.​natstyle.​check.​src.​source where source is the group of the rules defined in the type attribute of the <checks> element. I experimented a lot and did not find a way to load rules from other packages than com.​softwareag.​naturalone.​natural.​natstyle. All rules must be defined inside this name space, which is possible.

Source Rules
While I cannot see the actual code of NATstyle rules, Java classes expose their public methods and parent class. I did see the names of the rule classes in the configuration and guessed and experimented with the API a lot. My experience with other static analysis tools, e.g. PMD and Pylint and the good method names of NATstyle code helped me doing so. A basic Source rule looks like that:
package com.softwareag.naturalone.natural.natstyle.check.src.source; // 1.

import com.softwareag.naturalone.natural.natstyle.NATstyleCheckerSourceImpl;
// other imports ...

public class FindFooSourceRule
  extends NATstyleCheckerSourceImpl { // 2.

  private Matcher name;

  @Override
  public void initParameterList() {
    name = Pattern.compile("FOO").matcher(""); // 3.
  }

  @Override
  public String run() { // 4.
    StringBuffer xmlOutput = new StringBuffer();

    String[] lines = this.getSourcelines(); // 5.
    for (int line = 0; line < lines.length; i++) {
      name.reset(lines[line]);
      if (name.find()) {
        setError(xmlOutput, line, "Message"); // 6.
      }
    }

    return xmlOutput.toString(); // 7.
  }
}
The marked lines are important:
  1. Because it is a Source rule, it must be in exactly this package - see the paragraph above.
  2. Source rules extend NATstyleCheckerSourceImpl which provides the lines of the NATURAL source file - see line 6. It has more methods, which have reasonable names, use the code completion.
  3. You initialise parameters in initParameterList. I did not figure out how to make the rules configurable from the XML configuration, which will probably happen in here, too.
  4. The run method is executed for each NATURAL file.
  5. NATstyleCheckerSourceImpl provides the lines of the file in getSourcelines. You can iterate the lines and check them.
  6. If there is a problem, call setError. Now setError is a bit weird, because it writes an XML element for the violation report XML (e.g. NATstyleResult.xml) into a StringBuffer.
  7. In the end the return the XML String of all found violations.
Finally the rule is configured with
<checks type="source">
  <check class="FindFooSourceRule"
         name="Find FOO"
         severity="warning" />
</checks>
(In the example repository, there is a working Source rule FindInv02.java together with its configuration customSource.xml.)

Parser Rules
Now it is getting more interesting. There are 18 rules of this type, which is a good start, but we need moar! Parser rules look similar to Source rules:
package com.softwareag.naturalone.natural.natstyle.check.src.parser; // 1.

import com.softwareag.naturalone.natural.natstyle.NATstyleCheckerParserImpl;
// other imports ...

public class SomeParserRule
  extends NATstyleCheckerParserImpl { // 2.

  @Override
  public void initParameterList() {
  }

  @Override
  public String run() {
    StringBuffer xmlOutput = new StringBuffer();

    // create visitor
    getNaturalParser().getNaturalASTRoot().accept(visitor); // 3.
    // collect errors from visitor into xmlOutput

    return xmlOutput.toString();
  }
}
where
  1. Like Source rules, Parser rules must be defined under the package ...natstyle.​check.​src.​parser.
  2. Parser rules extend NATstyleCheckerParserImpl.
  3. The NATURAL parser traverses the AST of the NATURAL code. Similar to other tools, NATstyle uses a visitor, the INaturalASTVisitor. The visitor is called for each node in the AST tree. This is similar to PMD.
Using the Parser
Tree, MuthillThe visitor must implement INaturalASTVisitor in package com.​softwareag.​naturalone.​natural.​parser.​ast.​internal. This interface defines 48 visit methods for the different sub types of INaturalASTNode, e.g. array indices, comments, operands, system function references like LOOP or TRIM, and so on. Still there are never enough node types as the AST does not convey much information about the code, most statements end up as INaturalASTTokenNode. For example the NATURAL lines
* print with leading blanks
PRINT 3X 'Hello'
which are a line comment and a print statement, result in the AST snippet
+ TOKEN: * print with leading blanks
+ TOKEN: PRINT
+ TOKEN: 3X
+ OPERAND
  + SIMPLE_CONSTANT_REFERENCE
    + TOKEN: 'Hello'
Now PRINT is a statement and could be recognised as one and 'Hello' is a string. This makes defining custom rules possible but pretty hard. To help me understand the AST I created a visitor which dumps the tree as XML file, similar to PMD's designer: DumpAstAsXml.java.

Conclusion
With this information you should be able to get started defining your own NATstyle rules. There is always so much more we could and should check automatically.

13 August 2018

This is the last time

It is August, it is hot and it is time to do some alternate work:

Bare Brickwork
Quick Fix
Looking at the picture made me think. Every industry seems to have its quick fixes: Engineers use duct tape and WD-40. In software development, we take short cuts by violating existing structure, e.g. adding another conditional resulting in convoluted logic. And in the construction industry it clearly is the usage of polyurethane or spray foam. As every tool it has its use, and like in the picture it is especially useful for temporary work. Unfortunately most professional construction workers I commissioned wanted to use it for everything - like duct tape: A hole in the wall? Why use brick and mortar when it is faster to put some foam into it. A gap in the boards? Why care to work accurately when it is easier to close the gap with foam afterwards. Not enough steam brake to cover the ceiling? Let's fill the remaining area with foam. You get the idea. While some people like the speed such quick fixes provide, e.g. Joel Spolsky praises the Duct Tape Programmer, I do not agree. Anyway, this project must come to an end and I sincerely hope it is the last time. ;-)