13 July 2018

Categories of Architectural Refactoring

At the end of one of my refactoring workshops, we started discussing how to refactor
larger structures, sometimes even changing the architecture of a system. During that discussion we came up with four different categories of larger refactoring which use different techniques and serve different goals:
  • Replace whole parts
  • Change the structure of components
  • Similar change all over the place
  • Other changes
Seoul SkyscrapersWe struggled to find good names, so let me explain the different things a bit. When I talk about architecture, I mean Software Architecture of a single application, i.e. 1. the different software components and 2. their (inter-) dependencies making up the application. Components are sub-systems and usually contain one or more namespaces or packages. While we can discuss forever what exactly software architecture is and is not, I like the definition of IEEE which in addition to elements (1.) and relationships (2.) adds principles of its design and evolution. I describe these principles as 3. design guidelines which include smaller recommended usage patterns and coding conventions.

Martin Fowler and Neal Ford say that architecture is the stuff that's hard to change later and replacing parts of the architecture involves a lot of work: Imagine (the highly theoretical example of) changing the programming language a system is written in or (a more common example of) moving from a monolith to a Microservice based architecture. These two examples already fall into different categories.

Replacing whole parts of the architecture
Massive changes like changing the UI technology or targeting another platform are possible but a lot of code has to be rewritten. Substitute Algorithm is a classic Fowler refactoring, and so is Substitute Architectural Decision. (This is my third try to name these category of changes.) Architectural Decisions are made to support non-functional requirements like performance, security, connectivity etc and when revisiting them later - with more domain knowledge - better alternatives might come up. Real examples of changing Architectural Decisions include:
  • Replacing one relational database system with another
  • Replacing your relational database with a NoSQL one
  • Replacing UI technologies, e.g. moving from JSF to Vaadin
  • Changing API technologies, e.g. moving from SOAP to REST
  • Switching from a fat client to a client-server architecture
Depending on used frameworks and abstractions, some of these changes will be easy, others will not. Even when the necessary change is isolated, it likely touches more than one architectural component. For example moving to another data store might need changing all data access objects in all components which access the data store. The traditional approach is to change everything at once ("Big Leap") but it is possible to refactor the architecture in a safe way, using small steps. In Responsive Design (QCon 2009) Kent Beck describes several strategies to perform such changes:
  • Leap: Make the new design and use it immediately. (Only if the change is not too big.)
  • Parallel Change: Make the new design and run both new and old simultaneously.
  • Stepping Stone: Create a platform to bring the desired new design within reach.
  • Simplification: Only implement part of the design now and add complexity bit-by-bit later.
  • Hack it: Just do it and pay price later. (This is maybe not a real strategy ;-)
Changing the structure of architectural components
The next category is similar to Substitute Architectural Decision but is focused on the architectural building blocks - the components. Some examples are:
  • Introduce or extract new component, module or package
  • Remove or invert dependencies between components, e.g. remove cyclic dependencies
  • Introduce layering, add a new layer, clean up layering violations
  • Weaken potential trouble spots in the architecture, e.g. Butterfly, Breakable, Hub and Tangle
foundationThis category of changes is the "classic" Architectural Refactoring because that is how people call it. For example - in accordance to Martin Fowler's definition - Dave Adsit defines Architectural Refactoring as intentionally changing the structure of a system without altering its features. Similarly Sean Barow explains Architectural Refactoring as the deliberate process to remove architectural smells while not changing the scope or functionality of the code. So it is always structural and deals with components (1.), their relationships (2.) and their inner structure. Let's call it Refactor Architectural Structure to distinguish it from other architectural refactoring.

Sean Barow uses the term Code Refactoring for the refactoring defined by Fowler. Code refactoring focuses on software entities like classes, methods, fields and variables. Architectural refactoring involves code refactoring, which leads to confusion between the two. Working with dependencies needs extracting or moving classes, methods or fields. We can work each class or dependency at once, which supports an incremental approach.

Performing similar changes throughout the whole project
This is another category and it feels like Shotgun Surgery: A usually small change has to be repeated all over the place. (While Shotgun Surgery is a code smell for feature logic, it is unavoidable if not required for consistent structure and usage defined by the architecture.) For example let's assume we want to replace EasyMock with Mockito. Both libraries serve the same purpose and are used in a similar way, but their APIs are different. (Facing such a problem indicates that we are coupled to much to a used library - maybe we should have wrapped it, maybe not. But that discussion is outside the scope of this article.) Converting a single usage of EasyMock to Mockito is straight forward: We change the method invocations and execute the calls earlier or later in the test. The only problem is that we have hundreds of such cases. More examples of similar "Shotgun" changes are:
  • Upgrading a major version of a library with breaking API changes
  • Migrating between similar frameworks or libraries
  • Changing or unifying coding conventions and fixing violations
  • Consistently applying or changing aspects like logging or security
  • Applying internationalization
These changes do not change the overall structure, they just work on the implementation. They are similar to Refactor Architectural Structure, just deal with design guidelines, coding conventions and smaller usage patterns - item 3 from the definition of Software Architecture in the beginning. They are code refactoring applied in many places consistently across a whole code base.

These category is important, because some steps of both Substitute Decision and Refactor Structure need similar changes. The main challenge of these changes is the high number of occurrences. Changing conventions and aspects is a piece by piece work and can be done manually. Nevertheless some automation tool or script would help a lot. Even converting only basic scenarios, which often cover up to 80%, is a big win. Unfortunately some changes, e.g. migrating libraries, need to be done all at once. In such cases a Stepping Stone approach has worked well for me in the past: First wrapping the library piece by piece - incrementally - and then changing the wrapper and all invocations at once. Creating the wrapper is a lot of work. Again, automated migration removes the need for a wrapper. There are some options available, which I will cover in a future article.

I have not found a good name for this category. I used to call it Large Scale Refactoring, but I am not happy with that name. It is a similar or identical code refactoring applied to many places widespread throughout the code. It is a Widespread Architectural Change.

Other Architectural Changes
During our initial discussion, we also considered other refactoring. To be on the safe side, always allow a category of unknown things ;-). At the moment I have no real examples for them. These are smaller changes neither related to architectural decisions or structure nor are they widespread. I believe we find no examples because these are plain code changes and we do not consider them architectural even if they are related to the architecture of the applications in a way.

FavelaSummary
Next to the code we also can and should refactor the architecture of our applications. Architectural changes are expensive, still sooner or later things need to change if the application is in service just long enough. There are at least four categories of architectural refactoring:
  • Substitute Architectural Decision
  • Refactor Architectural Structure
  • Widespread Architectural Change
  • Other Architectural Changes
Next time I will dive into tools to automate Widespread Architectural Changes.

3 July 2018

Using NATstyle from the commandline

Last year a new client contracted me to help them make their development process, their development environment and tooling "state of the art" (aka up do date). From the coding perspective that included - among other things - using a decent IDE, writing unit tests, having some Continuous Delivery pipeline in place and using static code analysis. I thought that it should not be too difficult, right? It was, but that is not the goal of this article. Today I want to describe the static analysis features and how to use it.

Earth Sapphire Progress Picture 30 july-2011NATURAL
The code base was NATURAL. NATURAL is an application development and deployment environment using a proprietary language maintained by Software AG. It was created in 1979. André describes it as Cobol's ugly, brain-damaged, BABBLING IN ALL-CAPS - but regrettably still healthy and strong - cousin.. ;-) Well, it is a procedural language structured using modules, subroutines and functions. It feels as old as it is. For example, module names are limited to eight (8!) characters and there is no way to structure the modules further. An enterprise application might contain 5.000 to 10.000 modules. To me NATURAL feels much like early Pascal: units (modules), subroutines and functions together with the eight characters DOS name limit. I really liked Pascal back then and I do not feel that bad about NATURAL. Probably also because I do not have to use it on a daily basis.

NaturalONE and NATstyle
NaturalONE is the Eclipse-based tool for NATURAL by Software AG. It looks pretty good, similar to what you would expect from Eclipse. It contains the usual features like Outline, Search, Debug and something called NATstyle. Now NATstyle - which probably on purpose sounds similar to the well known Checkstyle - is an utility to define and check the coding standard in your programs. It is used inside NaturalONE, see a NATstyle Demo by Software AG. (For more information refer to the Eclipse/NaturalONE help topic Checking Natural Code with NATstyle.)

Invoking NATstyle from outside NaturalONE
I like static code analysis and consider it a mandatory component of every delivery pipeline. As I said above, I wanted to have some static code analysis and was wondering if I would be able to run NATstyle in the pipeline? I did not find any information on the topic and was forced to experiment. I used NaturalONE 8.3.5.0.242 CE (November 2016) and figured out several ways, which were hardly documented, and might be subject to change. Likely there are other options available in newer versions of NaturalONE and NATstyle.

Invoking NATstyle from command line
NATstyle is just an Eclipse plugin inside NaturalONE. In my edition it is the folder C:\Natural-CE\ Designer\ eclipse\ plugins\ com.softwareag.naturalone.natural.natstyle_8.3.5.0000-0242. With the proper class path it is possible to invoke NATstyle's main method, which prints help on its usage:
Usage: com.softwareag.naturalone.natural.natstyle.NATstyle [-options]

where options include:
  -projectpath <directory> Specify where to find the Natural project.
  -rootfolder              Library root folder support enabled.
  -c <file>                Specify the configuration file.
  -o <file>                Specify the output file.
  -sourcefiles <srcList>   Specify list of source files to be loaded.
                           separated with ;
  -libraries <libList>     Specify list of libraries to be loaded.
                           separated with ;
  -exclude <libList>       Specify a list of libraries to exclude.
                           separated with ;
  -p <directory>           Specify where to find additional packages.
  -help                    Display command line options and exit.

Make sure the following classes can be found in your class path:

com.softwareag.naturalone.natural.auxiliary
com.softwareag.naturalone.natural.common
com.softwareag.naturalone.natural.parser
Nice, that looks like the developers from Software AG prepared NATstyle to be called stand-alone in my pipeline. +1. All the necessary classes can be found in the following plugins of Eclipse/NaturalONE:
  • Folder com.softwareag.naturalone.natural.natstyle_8.3.5.0000-0242
  • Jar com.softwareag.naturalone.natural.auxiliary_*.jar
  • Folder com.softwareag.naturalone.natural.common_8.3.5.0000-0242
  • Jar com.softwareag.naturalone.natural.parser_*.jar
  • Jar org.eclipse.equinox.common_*.jar
These are all bundles defined by Eclipse/OSGi. The list of dependencies of these bundles also contains the following Jars
  • org.eclipse.swt.win32.*.jar
  • org.eclipse.ui.console_*.jar
which (in my experience) are not needed to use NATstyle. NATstyle does not use any other OSGi features and we can set the needed class path by hand. Here is an example how to do that in the Windows shell when NaturalONE is installed:
set N1=C:\Natural-CE\Designer\eclipse\plugins
set CLASSPATH=^
%N1%\com.softwareag.naturalone.natural.natstyle_8.3.5.0000-0242;^
%N1%\com.softwareag.naturalone.natural.auxiliary_8.3.5.0000-0242.jar;^
%N1%\com.softwareag.naturalone.natural.common_8.3.5.0000-0242;^
%N1%\com.softwareag.naturalone.natural.parser_8.3.5.0000-0242.jar;^
%N1%\org.eclipse.equinox.common_3.6.200.v20130402-1505.jar

java com.softwareag.naturalone.natural.natstyle.NATstyle %*
See run_natstyle for the script I used to run NATstyle directly. Copying the needed folders and Jars to another machine works well. I recommend to create a Jar from the folders before copying them around. create_jar how to do that. Make sure not to include META-INF\*.RSA or META-INF\*.SF into the new Jars because the checksums do not match. copy_jars does the actual copying into the lib folder.

Invoking NATstyle from Ant
NaturalONE supports Ant for some automation tasks. There is no specific NATstyle Ant task, but calling it from Ant is straight forward. If the path <path id="natstyle.classpath"> is set up to contain all the Folders and Jars, NATstyle is executed via Ant's java task:
<java taskname="natstyle"
      classname="com.softwareag.naturalone.natural.natstyle.NATstyle"
      dir="${basedir}" fork="true" maxmemory="128m"
      failonerror="true"
      classpathref="natstyle.classpath">
  <arg value="-projectpath" />
  <arg value="../NatUnit/NatUnit_L4N" />
  ...
</java>
A new JVM needs to be forked because NATstyle calls System.exit() at the end. All parameters are the same as for the command-line invocation and are passed in via the <arg /> element. See ant_NatStyle.xml for the complete Ant script.

Reporting
NATstyle writes an XML report of all files if worked and found rule violations. The structure is
<NATstyle>
  <file type='...' location='...'>
    <error severity='...' message='...' />
    ...
  </file>
  ...
</NATstyle>
Inside the folder of NATstyle (i.e. com.softwareag.naturalone.natural.natstyle_8.3.5.0000-0242) there is a subfolder NATstyle. It contains sample files. Using the provided NATstyleSimple.xsl the report XML (e.g. NATstyleResult.xml) can be transformed into a human-readable format (e.g. NATstyleSimple.html). When using NATstyle stand-alone, you might want to copy this folder as well. XSLT transformation task is part of Ant and converts the XML to HTML.
<xslt basedir="${basedir}"
      destdir="${basedir}/report"
      style="${basedir}/NATstyle/NATstyleSimple.xsl">
  <include name="NATstyleResult_*.xml" />
</xslt>
See ant_NatStyleResultToHtml.xml for the complete Ant build script to convert the NATstyle result XML into a readable HTML report. Using your own stylesheet (.xsl) allows you to customize the reports.

29 June 2018

xUnit Testing Koans Reloaded

Koans are sequences of little exercises to help you "on your way to enlightenment". Testing Koans are similar exercises to help you learn unit testing. Usually the code is in place, but the assertions are missing. xUnit Koans are Testing Koans for SUnit inspired testing frameworks, like JUnit, minitest, NUnit, PHPUnit, xUnit.net and more. They are introductory exercises which cover the most basic usage of assertions and life cycle methods.

JUnit5 Koans RetrospectiveJUnit 4 and PHPUnit
Initially I created a few JUnit Testing Koans in 2012 as part of a JUnit training while I was with IBM. I wanted the training to be hands-on and interactive. (These koans are in the junit-koans repository.) In 2015 I picked up the idea and ported the code to PHP to run a PHPUnit workshop for my local community. (These koans are in the phpunit-koans repository.)

JUnit 5
In 2017, with the new version of JUnit, Görge Albrecht, the Code Mentor, helped me add koans for new features of JUnit 5. We used the koans as hands-on exercises which were a part of our JUnit 5 introductory workshop, delivered at Topconf Linz 2017 and other conferences. The exercises worked well and gave an overview of the (old and) new API and showed how to use it. People liked the workshop and had fun, see the points made during the retrospective of one of the workshops on the right. (The code is in the junit5-koans repository.)

Python's unittest
It seems that the koans keep coming back to me ;-) - and I keep coming back to them, always adding more exercises and languages. This year I ported them to Python, which took me around five hours. I ran them as a short, Pragmatic Introduction to Python Unit Testing at PyDays 2018. I liked PyDays for its enthusiastic crowd and consider the workshop successful because in the final feedback round several participants reported happily that they had "written their first test" and that "it's not that hard!". Even experienced participants reported some learnings, like "I never used skip" or "I did not know how to check the message in asserting exceptions". (The latest and greatest koans are in the python-unittest-koans repository.)

Teaching xUnit
Koans, i.e. learning progress verification by assertions, work well in both static and dynamic languages. There is no difference in workshops using Java and PHP or Python. Testing Koans are a suitable exercise to help people with their first steps in unit testing. In short two hours, people who have never seen a test will learn the concepts and flow of unit testing. I usually start with a few slides, about 20 minutes of theory about structure of xUnit tests before I ask people to work with the exercises.

JUnit5 handouts by GörgeLike with JUnit 5, Testing Koans also work well to introduce people to new testing framework APIs and their usage. People familiar with JUnit 4 will need no more than a migration guide, see Görge's handouts on the right, to get their first impression of the new features. Some more advanced aspects will require them to research the documentation, but that is part of the exercise.

People appreciate the "focused, step by step nature" of koans. The "exercise is clear" and "invites to explore xUnit". And most importantly "it was fun". In all the feedback I have collected for my xUnit Testing Koans, fun is dominant. And fun helps learning. (All quotes were taken as is from actual feedback I got.)

Moar Testing Koans!
Most xUnit frameworks are consistent with xUnit, and are similar in their usage. It is easy to port the koans to other languages, at least for basic examples. The same is true for newer versions. I should upgrade some koans to accommodate newer features of the used frameworks, on the other hand the basic features are still the same. I would love to see more ports and also Koans for different testing styles, e.g. RSpec or Jasmine Testing Koans.

Current List of Koans

25 February 2018

Complete Cofoja Setup Example

Design by Contract
Have you heard of Design by Contract (short DbC)? If not, here is a good introduction from Eiffel. (In short, Design by Contract is one of the major mechanisms to ensure the reliability of object-oriented software. It focuses on the communication between components and requires the interactions to be defined precisely. These specifications are called contracts and they contain Preconditions, Postconditions and Invariants. Unlike using assertions to ensure these conditions, DbC considers the contracts important parts of the design process which should be written first. It is a systematic approach to building bug-free object-oriented systems and helps in testing and debugging.)

Cofoja (Contracts for Java)
Cofoja is a Design by Contract library for Java. It uses annotation processing and byte code instrumentation to provide run-time checking. It supports a contract model similar to that of Eiffel, with added support for a few Java-specific things, such as exceptions. In Cofoja, contracts are written as Java code within quoted strings, embedded in annotations. Here is some sample code (derived from lost icontract library): A basic stack with methods to push, pop and to see the top element.
import java.util.LinkedList;
import com.google.java.contract.Ensures;
import com.google.java.contract.Invariant;
import com.google.java.contract.Requires;

@Invariant({ "elements != null",
             "isEmpty() || top() != null" }) // (1)
public class CofojaStack<T> {

  private final LinkedList<T> elements = new LinkedList<T>();

  @Requires("o != null") // (2)
  @Ensures({ "!isEmpty()", "top() == o" }) // (3)
  public void push(T o) {
    elements.add(o);
  }

  @Requires("!isEmpty()")
  @Ensures({ "result == old(top())", "result != null" })
  public T pop() {
    final T popped = top();
    elements.removeLast();
    return popped;
  }

  @Requires("!isEmpty()")
  @Ensures("result != null")
  public T top() {
    return elements.getLast();
  }

  public boolean isEmpty() {
    return elements.isEmpty();
  }

}
The annotations describe method preconditions (2), postconditions (3) and class invariants (1). Cofoja uses a Java 6 annotation processor to create .contract class files for the contracts. As soon as Cofoja's Jar is on the classpath the annotation processor is picked up by the service provider. There is no special work necessary.
javac -cp lib/cofoja.asm-1.3-20160207.jar -d classes src/*.java
To verify that the contracts are executed, here is some code which breaks the precondition of our stack:
import org.junit.Test;
import com.google.java.contract.PreconditionError;

public class CofojaStackTest {

  @Test(expected = PreconditionError.class)
  public void emptyStackFailsPreconditionOnPop() {
    CofojaStack<String> stack = new CofojaStack<String>();
    stack.pop(); // (4)
  }

}
We expect line (4) to throw Cofoja's PreconditionError instead of NoSuchElementException. Just running the code is not enough, Cofoja uses a Java instrumentation agent to weave in the contracts at runtime.
java -javaagent:lib/cofoja.asm-1.3-20160207.jar -cp classes ...
Cofoja is an interesting library and I wanted to use it to tighten my precondition checks. Unfortunately I had a lot of problems with the setup. Also I had never used annotation processors before. I compiled all my research into a complete setup example.

Maven
Someone already created an example setup for Maven. Here are the necessary pom.xml snippets to compile and run CofojaStackTest from above.
<dependencies>
  <dependency> <!-- (5) -->
    <groupId>org.huoc</groupId>
    <artifactId>cofoja</artifactId>
    <version>1.3.1</version>
  </dependency>
  ...
</dependencies>

<build>
  <plugins>
    <plugin> <!-- (6) -->
      <artifactId>maven-surefire-plugin</artifactId>
      <version>2.20</version>
      <configuration>
        <argLine>-ea</argLine>
        <argLine>-javaagent:${org.huoc:cofoja:jar}</argLine>
      </configuration>
    </plugin>
    <plugin> <!-- (7) -->
      <artifactId>maven-dependency-plugin</artifactId>
      <version>2.9</version>
      <executions>
        <execution>
          <id>define-dependencies-as-properties</id>
          <goals>
            <goal>properties</goal>
          </goals>
        </execution>
      </executions>
    </plugin>
  </plugins>
</build>
Obviously we need to declare the dependency (5). All examples I found register the contracts' annotation processor with the maven-compiler-plugin, but that is not necessary if we are using the Maven defaults for source and output directories. To run the tests through the agent we need to enable the agent in the maven-surefire-plugin (6) like we did for plain execution with java. The Jar location is ${org.huoc:cofoja:jar}. To enable its resolution we need to run maven-dependency-plugin's properties goal (7). Cofoja is build using Java 6 and this setup works for Maven 2 and Maven 3.

Gradle
Similar to Maven, but usually shorter, we need to define the dependency to Cofoja (5) and specify the Java agent in the JVM argument during test execution (6). I did not find a standard way to resolve a dependency to its Jar file and several solutions are possible. The cleanest and shortest seems to be from Timur on StackOverflow, defining a dedicated configuration for Cofoja (7), which avoids duplicating the dependency in (5) and which we can use to access its files in (6).
configurations { // (7)
  cofoja
}

dependencies { // (5)
  cofoja group: 'org.huoc', name: 'cofoja', version: '1.3.1'
  compile configurations.cofoja.dependencies
  ...
}

test { // (6)
  jvmArgs '-ea', '-javaagent:' + configurations.cofoja.files[0]
}
Eclipse
Even when importing the Maven project into Eclipse, the annotation processor is not configured and we need to register it manually. Here is the Eclipse help how to do that. Fortunately there are several step by step guides how to set up Cofoja in Eclipse. In the project configuration, enable Annotation Processing under the Java Compiler settings.
Eclipse Project Annotation Processing
Although Eclipse claims that source and classpath are passed to the processor, we need to configure source path, classpath and output directory.
com.google.java.contract.classoutput=%PROJECT.DIR%/target/classes
com.google.java.contract.classpath=%PROJECT.DIR%/lib/cofoja.asm-1.3-20160207.jar
com.google.java.contract.sourcepath=%PROJECT.DIR%/src/main/java
(These values are stored in .settings/org.eclipse.jdt.apt.core.prefs.) For Maven projects we can use the %M2_REPO% variable instead of %PROJECT.DIR%.
com.google.java.contract.classpath=%M2_REPO%/org/huoc/cofoja/1.3.1/cofoja-1.3.1.jar
Add the Cofoja Jar to the Factory Path as well.
Eclipse_ Project Factory Path
Now Eclipse is able to compile our stack. To run the test we need to enable the agent.
Eclipse Run Configuration JUnit
IntelliJ IDEA
StackOverflow has the answer how to configure Cofoja in IntelliJ. Enable annotation processing in Settings > Build > Compiler > Annotation Processors.
IDEA Settings Annotation Processors
Again we need to pass the arguments to the annotation processor.
com.google.java.contract.classoutput=$PROJECT_DIR$/target/classes
com.google.java.contract.classpath=$M2_REPO$/org/huoc/cofoja/1.3.1/cofoja-1.3.1.jar
com.google.java.contract.sourcepath=$PROJECT_DIR$/src/main/java
(These values are stored in .idea/compiler.xml.) For test runs we enable the agent.
IDEA Run Configuration JUnit
That's it. See the complete example's source including all configuration files for Maven, Gradle, Eclipse and IntelliJ IDEA.

18 February 2018

Checking Python for Compliance with Object Calisthenics

To use one of my Object Orientation workshops for a different client, I needed its whole setup in Python. (To summarise the workshop: I used Jeff Bay's Object Calisthenics as constraint and static code analysis to check the code for compliance. This helped participants follow the rules.)

I found that Pylint already contained some checkers I could use out of the box to check Python code:
  • checkers.refactoring with setting max-nested-blocks=1 for rule #1 (Use only one level of indentation per method.)
  • checkers.design_analysis with setting max-branches=1 also for rule #1, but stronger, which I like more.
  • checkers.design_analysis with setting max-attributes=2 for rule #7 (Don't use any classes with more than two instance variables.)
Carpet PythonDiving deeper into Pylint's custom checkers, I created some of my own to enforce rules #2, #4, #6, #7, #8 and #9. For rule #4 (One Dot per Line) I used the approach of PHP_CodeSniffer and looked for nested expressions, allowing only one dot per statement, ignoring self references.

Dynamic Nature
Due to the dynamic nature of Python there is a grey area when working with types. For first-class collections (rule #8), only certain uses of collections are identified at the moment. I was not able to check for primitives at object boundaries at all, so rule #3 is not verified.

Figuring out how to write complex checkers for Pylint was difficult at first, but also a lot of fun. Like PMD and most static analysis tools I know, Pylint works with the Abstract Syntax Tree of the code and checkers are built using the Visitor design pattern. The main problem was figuring out which types of nodes exist, then the implementation of the rules was straight forward.

Python Project Setup
The setup is similar to the Java project. I created a repository for the LCD Numbers under lcd-numbers-object-calisthenics-python-setup. To test your setup there is some code in the project and ./run_pylint will show its violations. Pylint is configured using the objectcalisthenics.pylintrc file in the project directory. The individual checkers are in ./pylint/checkers.

11 January 2018

Compliance with Object Calisthenics

During my work as Code Cop I run many workshops. Sometimes I use constraints to make exercises more focused or more intense. Some constraints, like the Brutal Coding Constraints, are composite or aggregate constraints, which means that they are a combination of several simpler or low level constraints. (Here simple does not mean that they are easy to follow, rather that they focus on a single thing.) Today I want to discuss Object Calisthenics.

some warmup calisthenicsObject Calisthenics
Jeff Bay's Object Calisthenics is an aggregate constraint combined of the following nine rules:
  1. Use only one level of indentation per method.
  2. Don't use the else keyword.
  3. Wrap all primitives and strings (in public API).
  4. Use only one dot per line.
  5. Don't abbreviate (long names).
  6. Keep all entities small.
  7. Don't use any classes with more than two instance variables.
  8. Use first-class collections.
  9. Don't use any getters/setters/properties.
If you are not familiar with these rules, I recommend you read Jeff Bay's original essay published in the The ThoughtWorks Anthology in 2008. William Durand's post is an exact copy of that essay, so no need to buy the book for that. Several people have followed up and discussed their interpretation and experience with Object Calisthenics, e.g. Mark Needham, Jeff Pace, Vasiliki Vockin and Juan Antonio.

Object Calisthenics is an exercise in object orientation. How is that? One of the core concepts of OOP is Abstraction: A class should capture one and only one key abstraction. Obviously primitive and built-in types lack abstraction. Wrapping primitives (rule #3) and wrapping collections (rule #8) drive the code towards more abstractions. Small entities (rule #6) help to keep our abstractions focused.

Further objects are defined by what they do, not what they contain. All data must be hidden within its class. This is Encapsulation. Rule #9, No Properties, forces you to stay away from accessing the fields from outside of the class.

Next to Abstraction and Encapsulation, these nine rules help Loose Coupling and High Cohesion. Loose Coupling is achieved by minimizing the number of messages sent between a class and its collaborator. Rule #4, One Dot Per Line, reduces the coupling introduced by a single line. This rule is misleading, because what Jeff really meant was the Law Of Demeter. The law is not about counting dots per line, it is about dependencies: "Only talk to your immediate friends." Sometimes even a single dot in a line will violate the law.

High Cohesion means that related data and behaviour should be in one place. This means that most of the methods defined on a class should use most of the data members most of the time. Rule #5, Don't Abbreviate, addresses this: When a name of a field or method gets long and we wish to shorten it, obviously the enclosing scope does not provide enough context for the name, which means that the element is not cohesive with the other elements of the class. We need another class to provide the missing context. Next to naming, small entities (rule #6) have a higher probability of being cohesive because there are less fields and methods. Limiting the number of instance variables (rule #7) also keeps cohesion high.

The remaining rules #1 and #2, One Level Of Indentation and No else aim to make the code simpler by avoiding nested code constructs. After all who wants to be a PHP Street Fighter. ;-)

Checking Code for Compliance with Object Calisthenics
When facilitating coding exercises with composite constraints, I noticed how easy it is to overlook certain violations. We are used to conditionals or dereferencing pointers that we might not notice them when reading code. Some rules like the Law Of Demeter or a maximum size of classes need a detailed inspection of the code to verify. To check Java code for compliance with Object Calisthenics I use PMD. PMD contains several rules we can use:
  • Rule java/coupling.xml/LawOfDemeter for rule #4.
  • Rule #6 can be checked with NcssTypeCount. A NCSS count of 30 is usually around 50 lines of code.
    <rule ref="rulesets/java/codesize.xml/NcssTypeCount">
        <properties>
            <property name="minimum" value="30" />
        </properties>
    </rule>
  • And there is TooManyFields for rule #7.
    <rule ref="rulesets/java/codesize.xml/TooManyFields">
        <properties>
            <property name="maxfields" value="2" />
        </properties>
    </rule>
I work a lot with PMD and have created custom rules in the past. I added rules for Object Calisthenics. At the moment, my Custom PMD Rules contain a rule set file object-calisthenics.xml with these rules:
  • java/constraints.xml/NoElseKeyword is very simple. All else keywords are flagged by the XPath expression //IfStatement[@Else='true'].
  • java/codecop.xml/FirstClassCollections looks for fields of known collection types and then checks the number of fields.
  • java/codecop.xml/OneLevelOfIntention
  • java/constraints.xml/NoGetterAndSetter needs a more elaborate XPath expression. It is checking MethodDeclarator and its inner Block/ BlockStatement/ Statement/ StatementExpression/ Expression/ PrimaryExpressions.
  • java/codecop.xml/PrimitiveObsession is implemented in code. It checks PMD's ASTConstructorDeclaration and ASTMethodDeclaration for primitive parameters and return types.
For the nitty-gritty details of all the rules have a look at codecop.xml and constraints.xml.

AutomaticInterpretation of Rules: Indentation
When I read Jeff Bay's original essay, the rules were clear. At least I thought so. Verifying them automatically showed some areas where different interpretations are possible. Different people see Object Calisthenics in different ways. In comparison, the Object Calisthenics rules for PHP_CodeSniffer implement One Level Of Indentation by allowing a nesting of one. For example there can be conditionals and there can be loops, but no conditional inside of a loop. So the code is either formatted at method level or indented one level deep. My PMD rule is more strict: Either there is no indentation - no conditional, no loop - or everything is indented once: for example, if there is a loop, than the whole method body must be inside this loop. This does not allow more than one conditional or loop per method. My rule follows Jeff's idea that each method does exactly one thing. Of course, I like my strict version, while my friend Aki Salmi said that I went to far as it is more like Zero Level Of Indentation. Probably he is right and I will recreate this rule and keep the Zero Level Of Indentation for the (upcoming) Brutal version of Object Calisthenics. ;-)

Wrap All Primitives
There is no PHP_CodeSniffer rule for that, as Tomas Votruba considers it "too strict, vague or annoying". Indeed, this rule is very annoying if you use primitives all the way and your only data structure is an associative array or hash map. All containers like java.util.List, Set or Map are considered primitive as well. Samir Talwar said that every type that was not written by yourself is primitive because it is not from your domain. This prohibits the direct usage of Files and URLs to name a few, but let's not go there. (Read more about the issue of primitives in one of my older posts.)

My rule allows primitive values in constructors as well as getters to implement the classic Value Object pattern. (The rule's implementation is simplistic and it is possible to cheat by passing primitives to constructors. And the getters will be flagged by rule #9, so no use for them in Object Calisthenics anyway.)

I agree with Tomas that this rule is too strict, because there is no point in wrapping primitive payloads, e.g. strings that are only displayed to the user and not acted on by the system. These will be false positives. There are certain methods with primitives in their signatures like equals and hashCode that are required by Java. Further we might have plain numbers in our domain or we use indexing of some sort, both will be false positives, too.

One Dot Per Line
As I said before, I use PMD's LawOfDemeter to verify rule #4. The law allows sending messages to objects that are
  • the immediate parts of this or
  • the arguments of the current method or
  • objects created inside the current method or
  • objects in global variables.
I did not look at PMD's source code to check the implementation of this rule - but it complains a lot. For me this is the most difficult rule of all nine rules. (I code according to #1, #3, #5 and #6 and can easily adapt to strictly follow #2, #7, #8 and #9.) Although it complains a lot, I found every violation correct. I learned much about Law Of Demeter by checking my code for violations. For example, calling methods on an element of an array is a violation. The indexed array access is similar to a pointer access. (In Ruby this is obvious because Array defines a method def [](index).) Another interesting fact is that (at least in PMD) the law flags calling methods on enums. The enum instances are not created locally, so we cannot send them messages. On the other hand, an enum is a global variable, so maybe it should be allowed to call methods on it.

The PHP_CodeSniffer rule follows the rule's name and checks that there is only one dot per line. This creates better code, because train wrecks will be split into explaining variables which make debugging easier. Also Tomas is checking for known fluent interfaces. Fluent interfaces - by definition - look like they are violating the Law Of Demeter. As long as the fluent interface returns the same instance, as for example basic builders do, there is no violation. When following a more relaxed version of the law, the Class Version of Law Of Demeter, than different implementations of the same type are still possible. The Java Stream API, where many calls return a new Stream instance of a different class - or the same class with a different generic type - is likely to violate the law. It does not matter. Fluent interfaces are designed to improve readability of code. Law Of Demeter violations in fluent interfaces are false positives.

Don't Abbreviate
I found it difficult to check for abbreviations, so rule #5 is not enforced. I thought of implementing this rule using a dictionary, but that is prone to false positives as the dictionary cannot contain all terms from all domains we create software for. The PHP_CodeSniffer rules check for names shorter than three characters and allow certain exceptions like id. This is a good start but is not catching all abbreviations, especially as the need to abbreviate arises from long names. Another option would be to analyse the name for its camel case patterns, requiring all names to contain lowercase characters between the uppercase ones. This would flag acronyms like ID or URL but no real abbreviations like usr or loc.

Small Entities
Small is relative. Different people use different limits depending on programming language. Jeff Bay's 50 lines work well for Java. Rafael Dohms proposes to use 200 lines for PHP. PHP_CodeSniffer checks function length and number of methods per class, too. Fabian Schwarz-Fritz limits packages to ten classes. All these additional rules follow Jeff Bay's original idea and I will add them to the rule set in the future.

Two Instance Variables
Allowing only two instance variables seems arbitrary - why not have three or five. Some people have changed the rules to allow five fields. I do not see how the choice of language makes a difference. Two is the smallest number that allows composition of object trees.

In PHP_CodeSniffer there is no rule for this because the number depends on the "individual domain of each project". When an entity or value object consists of three or more equal parts, the rule will flag the code but there is no problem. For example, a class BoundingBox might contain four fields top, left, bottom, right. Depending on the values, introducing a new wrapper class Coordinate to reduce these fields to topLeft and bottomRight might make sense.

No Properties
My PMD rule finds methods that return an instance field (a getter) or update it (a setter). PHP_CodeSniffer checks for methods using the typical naming conventions. It further forbids the usage of public fields, which is a great idea. As we wrapped all primitives (rule #3) and we have no getters, we can never check their values. So how do we create state based tests? Mark Needham has discussed "whether we should implement equals and hashCode methods on objects just so that we can test their equality. My general feeling is that this is fine although it has been pointed out to me that doing this is actually adding production code just for a test and should be avoided unless we need to put the object into a HashMap or HashSet."

From what I have seen, most object oriented developers struggle with that constraint. Getters and setters are very ingrained. In fact some people have dropped that constraint from Object Calisthenics. There are several ways to live without accessors. Samir Talwar has written why avoiding Getters, Setters and Properties is such a powerful mind shift.

Java Project Setup
I created two repositories containing starting points for the LCD Numbers and Minesweeper Kata:Both are Apache Maven projects. The projects are set up to check the code using the Maven PMD Plugin on each test execution. Here is the relevant snippet from the pom.xml:
<build>
  <plugins>
    <plugin>
      <groupId>org.apache.maven.plugins</groupId>
      <artifactId>maven-pmd-plugin</artifactId>
      <version>3.7</version>
      <configuration>
        <printFailingErrors>true</printFailingErrors>
        <linkXRef>false</linkXRef>
        <typeResolution>true</typeResolution>
        <targetJdk>1.8</targetJdk>
        <sourceEncoding>${encoding}</sourceEncoding>
        <includeTests>true</includeTests>
        <rulesets>
          <ruleset>/rulesets/java/object-calisthenics.xml</ruleset>
        </rulesets>
      </configuration>
      <executions>
        <execution>
          <phase>test</phase>
          <goals>
            <goal>check</goal>
          </goals>
        </execution>
      </executions>
      <dependencies>
        <dependency>
          <groupId>org.codecop</groupId>
          <artifactId>pmd-rules</artifactId>
          <version>1.2.3</version>
        </dependency>
      </dependencies>
    </plugin>
  </plugins>
</build>
You can add this snippet to any Maven project and enjoy Object Calisthenics. The Jar file of pmd-rules is available in my personal Maven repository.

To test your setup there is sample code in both projects and mvnw test will show two violations:
[INFO] PMD Failure: SampleClass.java:2 Rule:TooManyFields Priority:3 Too many fields.
[INFO] PMD Failure: SampleClass:9 Rule:NoElseKeyword Priority:3 No else keyword.
It is possible to check the rules alone with mvnw pmd:check. (Using the Maven Shell the time to run the checks is reduced by 50%.) There are two run_pmd scripts, one for Windows (.bat) and one for Linux (.sh).

Object Calisthenics RetrospectiveLimitations of Checking Code
Obviously code analysis cannot find everything. On the other hand - as discussed earlier - some violations will be false positives, e.g. when using the Stream API. You can use // NOPMD comments and @SuppressWarnings("PMD") annotations to suppress false positives. I recommend using exact suppressions, e.g. @SuppressWarnings("PMD.TooManyFields") to skip violations because other violations at the same line will still be found. Use your good judgement. The goal of Object Calisthenics is to follow all nine rules, not to suppress them.

Learnings
Object Calisthenics is a great exercise. I used it all of my workshops on Object Oriented Programming and in several exercises I did myself. The verification of the rules helped me and the participants to follow the constraints and made the exercise more strict. (If people were stuck I sometimes recommended to ignore one or another PMD violations, at least for some time.) People liked it and had insights into object orientation: It is definitely a "different" and "challenging way to code". "It is good to have small classes. Now that I have many classes, I see more structure." You should give it a try, too. Jeff Bay even recommends to run an exercise or prototype of at least 1000 lines for at least 20 hours.

The question if Object Calisthenics is applicable to real working systems remains. While it is excellent for exercise, it might be too strict to be used in production. On the other hand, in his final note, Jeff Bay talks about a system of 100,000 lines of code written in this style, where the "people working on it feel that its development is so much less tiresome when embracing these rules".