15 August 2018

Creating your own NATstyle rules

Last month I showed how to use NATstyle from the command line. NATstyle is the utility to define and check the coding standard of your NATURAL program. Today I want to explain how to customize and create your own rules beyond what is explained in the manual. I used NaturalONE 8.3.5.0.242 CE (November 2016). Likely there are more options and rules available in newer versions of NaturalONE and NATstyle.

Basic Configuration
NATstyle comes packaged inside NaturalONE, the Eclipse-based IDE for NATURAL. As expected NATstyle can be configured in Eclipse preferences. The configuration is saved as NATstyle.xml which is used when you run NATstyle from the right click popup menu. We will need to modify NATstyle.xml later, so let's have a look at it:
<?xml version="1.0" encoding="utf-8"?>
<naturalStyleCheck version="1.0"
                   xmlns="http://softwareag.com/natstyle/rules"
                   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                   xsi:schemaLocation="http://softwareag.com/natstyle/rules checks.xsd">
  <checks type="source">
    <check class="CheckLineLength" name="Line length" severity="warning">
      <property name="max" value="72" />
      <property name="exclude" value="D3" />
    </check>
    <!-- more checks of type source -->
  </checks>
  <!-- more checks of other types -->
</naturalStyleCheck>
(The default configuration file is here together with its XML schema.)

Existing Rules
The existing rules are described in NaturalONE help topic Overview of NATstyle Rules, Error Messages and Solutions. Version 8.3 has 42 rules. These are only a few compared to PMD or SonarQube, which has more than 1000 rules available for Java. Here are some examples what NATstyle can do:
  • Source Checks: e.g. limit line length, find tab characters, find empty lines, limit the number of source lines and check a regular expressions for single source lines or whole source file.
  • Source Header Checks: e.g. force header or check file naming convention.
  • Parser Checks: e.g. find unused local variables, warn if local variable shadows view, find TODO comments, calculate Cyclomatic and NPath complexity, force NATdoc (documentation) tags and check function, subroutine and class names against regular expressions.
  • Error (Message File) Checks: e.g. check error messages file name.
  • Resource (File) Checks: e.g. check resource file name.
  • Library (Folder) Checks: e.g. library folder conventions, find special folders, force group folders and warn on missing NATdoc library documentation.
Same rule multiple times configured differently
Some rules like Source/Regular expression for single source lines only allow a single regular expression to be configured. Using alternation, e.g. a|b|c, in the expression is a way to overcome that, but the expression gets complicated quickly. Another way is to duplicate the <check> element in the NATstyle.xml configuration. Assume we do not only forbid PRINT statements, we also do not allow reduction to zero. (These rules do not make any sense, they are just here to explain the idea.) The relevant part of NATstyle.xml looks like
<checks type="source">
  <check class="CheckRegExLine"
         name="Regular expression for single source lines" ... >
    <property name="regex" value="PRINT '.*" />
  </check>
  <check class="CheckRegExLine"
         name="Regular expression for single source lines" ... >
    <property name="regex" value="REDUCE .* TO 0" />
  </check>
</checks>
While it is impossible to configure these rules in the NaturalONE preferences, it might be possible to run NATstyle with these modified settings. I did not verify that. I execute NATstyle from the command line passing in the configuration file name using the -c flag. (See the full configuration and script to to run the rules from the command line.)

rock piles of different sizeDefining your own rules
There is no documented way to create new rules for NATstyle. All rules' classes are defined inside the NATstyle plugin. The configuration XML contains a class attribute, which is a short name, e.g. CheckRegExLine. Its implementation is located in the package com.​softwareag.​naturalone.​natural.​natstyle.​check.​src.​source where source is the group of the rules defined in the type attribute of the <checks> element. I experimented a lot and did not find a way to load rules from other packages than com.​softwareag.​naturalone.​natural.​natstyle. All rules must be defined inside this name space, which is possible.

Source Rules
While I cannot see the actual code of NATstyle rules, Java classes expose their public methods and parent class. I did see the names of the rule classes in the configuration and guessed and experimented with the API a lot. My experience with other static analysis tools, e.g. PMD and Pylint and the good method names of NATstyle code helped me doing so. A basic Source rule looks like that:
package com.softwareag.naturalone.natural.natstyle.check.src.source; // 1.

import com.softwareag.naturalone.natural.natstyle.NATstyleCheckerSourceImpl;
// other imports ...

public class FindFooSourceRule
  extends NATstyleCheckerSourceImpl { // 2.

  private Matcher name;

  @Override
  public void initParameterList() {
    name = Pattern.compile("FOO").matcher(""); // 3.
  }

  @Override
  public String run() { // 4.
    StringBuffer xmlOutput = new StringBuffer();

    String[] lines = this.getSourcelines(); // 5.
    for (int line = 0; line < lines.length; i++) {
      name.reset(lines[line]);
      if (name.find()) {
        setError(xmlOutput, line, "Message"); // 6.
      }
    }

    return xmlOutput.toString(); // 7.
  }
}
The marked lines are important:
  1. Because it is a Source rule, it must be in exactly this package - see the paragraph above.
  2. Source rules extend NATstyleCheckerSourceImpl which provides the lines of the NATURAL source file - see line 6. It has more methods, which have reasonable names, use the code completion.
  3. You initialise parameters in initParameterList. I did not figure out how to make the rules configurable from the XML configuration, which will probably happen in here, too.
  4. The run method is executed for each NATURAL file.
  5. NATstyleCheckerSourceImpl provides the lines of the file in getSourcelines. You can iterate the lines and check them.
  6. If there is a problem, call setError. Now setError is a bit weird, because it writes an XML element for the violation report XML (e.g. NATstyleResult.xml) into a StringBuffer.
  7. In the end the return the XML String of all found violations.
Finally the rule is configured with
<checks type="source">
  <check class="FindFooSourceRule"
         name="Find FOO"
         severity="warning" />
</checks>
(In the example repository, there is a working Source rule FindInv02.java together with its configuration customSource.xml.)

Parser Rules
Now it is getting more interesting. There are 18 rules of this type, which is a good start, but we need moar! Parser rules look similar to Source rules:
package com.softwareag.naturalone.natural.natstyle.check.src.parser; // 1.

import com.softwareag.naturalone.natural.natstyle.NATstyleCheckerParserImpl;
// other imports ...

public class SomeParserRule
  extends NATstyleCheckerParserImpl { // 2.

  @Override
  public void initParameterList() {
  }

  @Override
  public String run() {
    StringBuffer xmlOutput = new StringBuffer();

    // create visitor
    getNaturalParser().getNaturalASTRoot().accept(visitor); // 3.
    // collect errors from visitor into xmlOutput

    return xmlOutput.toString();
  }
}
where
  1. Like Source rules, Parser rules must be defined under the package ...natstyle.​check.​src.​parser.
  2. Parser rules extend NATstyleCheckerParserImpl.
  3. The NATURAL parser traverses the AST of the NATURAL code. Similar to other tools, NATstyle uses a visitor, the INaturalASTVisitor. The visitor is called for each node in the AST tree. This is similar to PMD.
Using the Parser
Tree, MuthillThe visitor must implement INaturalASTVisitor in package com.​softwareag.​naturalone.​natural.​parser.​ast.​internal. This interface defines 48 visit methods for the different sub types of INaturalASTNode, e.g. array indices, comments, operands, system function references like LOOP or TRIM, and so on. Still there are never enough node types as the AST does not convey much information about the code, most statements end up as INaturalASTTokenNode. For example the NATURAL lines
* print with leading blanks
PRINT 3X 'Hello'
which are a line comment and a print statement, result in the AST snippet
+ TOKEN: * print with leading blanks
+ TOKEN: PRINT
+ TOKEN: 3X
+ OPERAND
  + SIMPLE_CONSTANT_REFERENCE
    + TOKEN: 'Hello'
Now PRINT is a statement and could be recognised as one and 'Hello' is a string. This makes defining custom rules possible but pretty hard. To help me understand the AST I created a visitor which dumps the tree as XML file, similar to PMD's designer: DumpAstAsXml.java.

Conclusion
With this information you should be able to get started defining your own NATstyle rules. There is always so much more we could and should check automatically.

2 comments:

Unknown said...

Interesting to see this possibility of extending NatStyle! We took a look at NatStyle a while ago and quickly decided that it wouldn't fit our needs - especially due to the lack of custom rules.

In the meantime, one of our students has developed a real "linter" or code checker for Natural with access to the AST and custom rules that can be integrated in SonarQube and even instant feedback inside Eclipse just like with our Java projects. We are in the middle of rolling out this solution in our company.

If you're interested in taking a look at it, feel free to contact me. We're planning to also release the tool as open source.

Best regards,
Stefan

Peter Kofler said...

Stefan,
this is a great news. Integration with SonarQube is a must. I was merely looking for options. Yes please keep me in the loop.