15 August 2018

Creating your own NATstyle rules

Last month I showed how to use NATstyle from the command line. NATstyle is the utility to define and check the coding standard of your NATURAL program. Today I want to explain how to customize and create your own rules beyond what is explained in the manual. I used NaturalONE 8.3.5.0.242 CE (November 2016). Likely there are more options and rules available in newer versions of NaturalONE and NATstyle.

Basic Configuration
NATstyle comes packaged inside NaturalONE, the Eclipse-based IDE for NATURAL. As expected NATstyle can be configured in Eclipse preferences. The configuration is saved as NATstyle.xml which is used when you run NATstyle from the right click popup menu. We will need to modify NATstyle.xml later, so let's have a look at it:
<?xml version="1.0" encoding="utf-8"?>
<naturalStyleCheck version="1.0"
                   xmlns="http://softwareag.com/natstyle/rules"
                   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                   xsi:schemaLocation="http://softwareag.com/natstyle/rules checks.xsd">
  <checks type="source">
    <check class="CheckLineLength" name="Line length" severity="warning">
      <property name="max" value="72" />
      <property name="exclude" value="D3" />
    </check>
    <!-- more checks of type source -->
  </checks>
  <!-- more checks of other types -->
</naturalStyleCheck>
(The default configuration file is here together with its XML schema.)

Existing Rules
The existing rules are described in NaturalONE help topic Overview of NATstyle Rules, Error Messages and Solutions. Version 8.3 has 42 rules. These are only a few compared to PMD or SonarQube, which has more than 1000 rules available for Java. Here are some examples what NATstyle can do:
  • Source Checks: e.g. limit line length, find tab characters, find empty lines, limit the number of source lines and check a regular expressions for single source lines or whole source file.
  • Source Header Checks: e.g. force header or check file naming convention.
  • Parser Checks: e.g. find unused local variables, warn if local variable shadows view, find TODO comments, calculate Cyclomatic and NPath complexity, force NATdoc (documentation) tags and check function, subroutine and class names against regular expressions.
  • Error (Message File) Checks: e.g. check error messages file name.
  • Resource (File) Checks: e.g. check resource file name.
  • Library (Folder) Checks: e.g. library folder conventions, find special folders, force group folders and warn on missing NATdoc library documentation.
Same rule multiple times configured differently
Some rules like Source/Regular expression for single source lines only allow a single regular expression to be configured. Using alternation, e.g. a|b|c, in the expression is a way to overcome that, but the expression gets complicated quickly. Another way is to duplicate the <check> element in the NATstyle.xml configuration. Assume we do not only forbid PRINT statements, we also do not allow reduction to zero. (These rules do not make any sense, they are just here to explain the idea.) The relevant part of NATstyle.xml looks like
<checks type="source">
  <check class="CheckRegExLine"
         name="Regular expression for single source lines" ... >
    <property name="regex" value="PRINT '.*" />
  </check>
  <check class="CheckRegExLine"
         name="Regular expression for single source lines" ... >
    <property name="regex" value="REDUCE .* TO 0" />
  </check>
</checks>
While it is impossible to configure these rules in the NaturalONE preferences, it might be possible to run NATstyle with these modified settings. I did not verify that. I execute NATstyle from the command line passing in the configuration file name using the -c flag. (See the full configuration and script to to run the rules from the command line.)

rock piles of different sizeDefining your own rules
There is no documented way to create new rules for NATstyle. All rules' classes are defined inside the NATstyle plugin. The configuration XML contains a class attribute, which is a short name, e.g. CheckRegExLine. Its implementation is located in the package com.​softwareag.​naturalone.​natural.​natstyle.​check.​src.​source where source is the group of the rules defined in the type attribute of the <checks> element. I experimented a lot and did not find a way to load rules from other packages than com.​softwareag.​naturalone.​natural.​natstyle. All rules must be defined inside this name space, which is possible.

Source Rules
While I cannot see the actual code of NATstyle rules, Java classes expose their public methods and parent class. I did see the names of the rule classes in the configuration and guessed and experimented with the API a lot. My experience with other static analysis tools, e.g. PMD and Pylint and the good method names of NATstyle code helped me doing so. A basic Source rule looks like that:
package com.softwareag.naturalone.natural.natstyle.check.src.source; // 1.

import com.softwareag.naturalone.natural.natstyle.NATstyleCheckerSourceImpl;
// other imports ...

public class FindFooSourceRule
  extends NATstyleCheckerSourceImpl { // 2.

  private Matcher name;

  @Override
  public void initParameterList() {
    name = Pattern.compile("FOO").matcher(""); // 3.
  }

  @Override
  public String run() { // 4.
    StringBuffer xmlOutput = new StringBuffer();

    String[] lines = this.getSourcelines(); // 5.
    for (int line = 0; line < lines.length; i++) {
      name.reset(lines[line]);
      if (name.find()) {
        setError(xmlOutput, line, "Message"); // 6.
      }
    }

    return xmlOutput.toString(); // 7.
  }
}
The marked lines are important:
  1. Because it is a Source rule, it must be in exactly this package - see the paragraph above.
  2. Source rules extend NATstyleCheckerSourceImpl which provides the lines of the NATURAL source file - see line 6. It has more methods, which have reasonable names, use the code completion.
  3. You initialise parameters in initParameterList. I did not figure out how to make the rules configurable from the XML configuration, which will probably happen in here, too.
  4. The run method is executed for each NATURAL file.
  5. NATstyleCheckerSourceImpl provides the lines of the file in getSourcelines. You can iterate the lines and check them.
  6. If there is a problem, call setError. Now setError is a bit weird, because it writes an XML element for the violation report XML (e.g. NATstyleResult.xml) into a StringBuffer.
  7. In the end the return the XML String of all found violations.
Finally the rule is configured with
<checks type="source">
  <check class="FindFooSourceRule"
         name="Find FOO"
         severity="warning" />
</checks>
(In the example repository, there is a working Source rule FindInv02.java together with its configuration customSource.xml.)

Parser Rules
Now it is getting more interesting. There are 18 rules of this type, which is a good start, but we need moar! Parser rules look similar to Source rules:
package com.softwareag.naturalone.natural.natstyle.check.src.parser; // 1.

import com.softwareag.naturalone.natural.natstyle.NATstyleCheckerParserImpl;
// other imports ...

public class SomeParserRule
  extends NATstyleCheckerParserImpl { // 2.

  @Override
  public void initParameterList() {
  }

  @Override
  public String run() {
    StringBuffer xmlOutput = new StringBuffer();

    // create visitor
    getNaturalParser().getNaturalASTRoot().accept(visitor); // 3.
    // collect errors from visitor into xmlOutput

    return xmlOutput.toString();
  }
}
where
  1. Like Source rules, Parser rules must be defined under the package ...natstyle.​check.​src.​parser.
  2. Parser rules extend NATstyleCheckerParserImpl.
  3. The NATURAL parser traverses the AST of the NATURAL code. Similar to other tools, NATstyle uses a visitor, the INaturalASTVisitor. The visitor is called for each node in the AST tree. This is similar to PMD.
Using the Parser
Tree, MuthillThe visitor must implement INaturalASTVisitor in package com.​softwareag.​naturalone.​natural.​parser.​ast.​internal. This interface defines 48 visit methods for the different sub types of INaturalASTNode, e.g. array indices, comments, operands, system function references like LOOP or TRIM, and so on. Still there are never enough node types as the AST does not convey much information about the code, most statements end up as INaturalASTTokenNode. For example the NATURAL lines
* print with leading blanks
PRINT 3X 'Hello'
which are a line comment and a print statement, result in the AST snippet
+ TOKEN: * print with leading blanks
+ TOKEN: PRINT
+ TOKEN: 3X
+ OPERAND
  + SIMPLE_CONSTANT_REFERENCE
    + TOKEN: 'Hello'
Now PRINT is a statement and could be recognised as one and 'Hello' is a string. This makes defining custom rules possible but pretty hard. To help me understand the AST I created a visitor which dumps the tree as XML file, similar to PMD's designer: DumpAstAsXml.java.

Conclusion
With this information you should be able to get started defining your own NATstyle rules. There is always so much more we could and should check automatically.

13 August 2018

This is the last time

It is August, it is hot and it is time to do some alternate work:

Bare Brickwork
Quick Fix
Looking at the picture made me think. Every industry seems to have its quick fixes: Engineers use duct tape and WD-40. In software development, we take short cuts by violating existing structure, e.g. adding another conditional resulting in convoluted logic. And in the construction industry it clearly is the usage of polyurethane or spray foam. As every tool it has its use, and like in the picture it is especially useful for temporary work. Unfortunately most professional construction workers I commissioned wanted to use it for everything - like duct tape: A hole in the wall? Why use brick and mortar when it is faster to put some foam into it. A gap in the boards? Why care to work accurately when it is easier to close the gap with foam afterwards. Not enough steam brake to cover the ceiling? Let's fill the remaining area with foam. You get the idea. While some people like the speed such quick fixes provide, e.g. Joel Spolsky praises the Duct Tape Programmer, I do not agree. Anyway, this project must come to an end and I sincerely hope it is the last time. ;-)

31 July 2018

Strong Opinions

As Code Cop I meet a lot of developers and I have to tell some of them that their code is crap and that they need to drive down their technical debt. Others need to test more, maybe start doing TDD. There are a few who would not listen. Some of them are even proud of stubbornly not listening, they say that they have strong opinions about the topic. They do not seem interested in in another view on their code or design? Why is that? I have some ideas.

GorillaDunning Kruger Effect
The Dunning Kruger Effect is a cognitive bias in which people of low ability have illusory superiority and mistakenly assess their cognitive ability as greater than it is. In other words, beginners believe themselves senior and think that they know better. There is no need to listen to an outsider.

I know this effect myself: When I learnt my first programming language(s), as soon as I could write a few lines of code, I felt invincible. (I thought that) I could do everything. I ruled. I knew how to do it and I knew I was right and that there was no other way to do it properly. Today, after writing code for 20 years in more than 25 languages, I do not feel like that any more. Sometimes I would like to go back to my state of mind of the late nineties and I try to look at new languages like a child - with a beginner's mind - but I know too much details. I know there will be nasty details, even for that newest and hottest languages of today.

Expert Beginner
An expert beginner has quickly reached (what looks like) expert status. He or she voluntarily ceases to improve because of a belief that expert status has been reached and thus further improvement is not possible. (I recommend reading the whole series of Erik Dietrich, How Developers Stop Learning: Rise of the Expert Beginner and further posts of his series.) Expert beginners make up defending arguments because they are the experts.

I have met some of these. I vividly remember two senior developers who had been with one of my clients for more than 16 years. They were opposing me on everything I said. Sure they had a superior knowledge of the application they had built and maintained for so many years, and they knew the domain they were working in pretty well. I am sure they were adding value to the product. They both were strong influencers. As soon as they would speak their mind, all other team members would just agree. But, but, but... Sigh. They were still using Ant, did not know Maven, did not write unit tests, did not care for clean code and so on - they made me very unhappy.

Quadrants of Knowledge
The quadrants of knowledge seem connected to my previous points. The four quadrants are
  1. Known Knowns: What you know that you know
  2. Unknown Knowns: What you do not know that you know
  3. Known Unknowns: What you know that you do not know
  4. Unknown Unknowns: What you do not know that you do not know
When we advance our career in software delivery, we learn: We collect concrete knowledge (quadrant 1), we gain experience which allows us to have gut feelings about some things (maybe quadrant 2) and we hear about things that we have no idea about (quadrant 3). There are gazillion things in software that I have no idea of: ANTLR, APL, AWS, Category Theory, Distributed Computing, Elm, Gradle, Guava, Haskell, HBase, J2EE, Kubernetes, Mongo DB, Networking, Node.JS, Prolog, Security - these are just the first few that come to my mind. It is easier to list things which I do not know in areas that I do know. I have been Java developer for 15 years and there are so many things in the Java ecosystem which I have no clue about. On the other hand there are not many things I can say about C#, because I only know it little.

So the more we learn, the more we know that we do not know, increasing our Known Unknowns (quadrant 3). The philosopher Confucius (551–479 BC) even said, Real knowledge is to know the extent of one's ignorance. Like the Dunning Kruger Effect, beginners have no idea about the vast size of Known Unknowns quadrant and do not expect surprises or potential improvements.

Strong Opinions, Weakly Held
The ideas of the Dunning Kruger Effect, Expert Beginners and Known Unknowns offer some explanation why some junior developers are not open for discussions. I have also met senior and expert level developers who are the same. When attending Software Crafting unconferences, e.g. SoCraTes, attendees are eager to learn and open for discussion. After all, that is the idea of unconferences, right? Still, many of them have strong opinions about almost everything, e.g. Spring Boot is cool. No, Spring Boot is hell. ;-) Different from beginners, all of them know the problems of strong opinions and claim that they only hold onto them weakly.

Deal with itWeakly Held, what is that supposed to mean? Isaac Morehouse explains in These Four Words Will Help You ‘Hold Strong Opinions Weakly': You act as if they [your opinions] are true unless and until it is proven they are not. Maybe people like strong opinions because they give them the power of definiteness. Choices are much easier and life is more predictable with definiteness and absolutes. Holding opinions weakly also gives the power of openness. This sounds hard: Basing one's actions on some definite "truths" and re-evaluating these truths whenever they are challenged. Seems like a difficult balancing act, if not a contradiction. For sure that state of mind is not easy to get into. I still need to see someone with strong opinion - weakly held or not - to change his or her mind,

Rant
When I started this article, it was supposed to be a rant. As usual, writing down my thoughts helped me to structure the material in a better way. (I consider writing an act of learning.) So where is the rant? Here it is:

Person with strong opinion == arsehole who does not want to listen.

That is a bit harsh, I agree. Obviously beginners need to listen more and due to Known Unknowns I expect experienced and expert developers to be open to discussion on any topic at any time. The strong opinion itself is not the problem, but the resulting behaviour, especially if strong opinion is used as warning or justification of some sort. I witness strong opinion mainly as excuse to not discuss or challenge existing views.

What about me?
Of course I had strong opinions during my professional life. I did not earn the nick name of Code Cop for nothing. Someone even called me Code Nazi. And my opinions were definitely not weakly held. I like absolutes. (Now I see - as I just learned above - the power of definiteness.) I guess I still have some strong opinions, and they are not just opinions, they are dogma: a system of principles proclaimed as unquestionable.

I have changed and it seems that the Code Cop is getting soft. ;-) The extended experience and discussions have weakened my definiteness. There is always some special case, the almighty it depends. And since I am working with people, and try to help them, I am more compassionate. I understand their problems, the difficulty to get time for quality and the pain of legacy code. Sometimes I catch myself not recommending what I believe the technically best course of action because it will be a lot of unpleasant, hard work for the team. No, that is too soft. If they made a mess, they have to fix it. Time to fall back on dogma. (Sound of firing cannons in the background ;-) All the time these balancing acts...