16 August 2016

Absolute Priority Premise, an Example

Optimus Prime, Transformers ToysTransformation Priority Premise
I would like to start this article with Uncle Bob Martin's Transformation Priority Premise - TPP for short - which he defined in 2011. With the premise Uncle Bob gave an algorithm how to find the next test when doing Test Driven Development. During (classic) TDD we use transformations to change the code to get from red to green. For example when starting with a new method the very first test is usually red because the empty or generated method returns null. We fake the first test by changing that null to return a constant value, e.g. return 5. This transformation is called nil->constant and has a high priority. A later test might force us to add a conditional to enable another fake. This transformation is called unconditional->if and has a medium priority. Replacing the value of a variable, i.e. variable->assignment has a low priority.

According to Uncle Bob, these transformations have a preferred ordering. We should prefer higher priority transformations to pass tests and chose tests in a way that they can be passed with higher priority transformations. When an implementation seems to require a low priority transformation, we could backtrack to see if there is another test to pass first which does not need that transformation. The theme behind the TPP is that as the tests get more specific, the code gets more generic. If you want to know more about TPP, see Uncle Bob's cartoon follow-up on the TPP and Sorting and his talk on the Transformation Priority Premise.

Absolute Priority Premise
In 2012 Micah Martin set out to define some heuristics to compare code objectively and proof that some code is better than some other code. Building on the TPP's priorities he defined the Absolute Priority Premise - APP for short. The APP knows six components, i.e. basic building blocks of code, and assigns them a mass. The building blocks and their weights were
  • constant, a value in code has mass of 1.
  • binding, a name or variable has a mass of 1, too.
  • invocation, calling a method or function - mass 2.
  • conditional, any form of if, switch or case - mass 4.
  • loop, for or while loops - mass 5.
  • assignment, replacing the value of a variable - mass 6.
More complex building blocks have a higher mass. The total mass of a piece of code is the sum of the mass of its respective components. A lower value is better. The best set of specific values of mass are unknown and I will use Micah's weights. If you chose to count different, please let us discuss!

A detailed explanation of the Absolute Priority Premise is given in the two presentations of 8th Light University (8LU) - Part One and Part Two. See also Micah's Coin Changer Kata as a complete example of applying the premise.

Measuring the Mass of Code
I like Micah's idea to measure the mass of code. It might not be a direct indication of readability but simpler code is always better. I wrote six different versions of the Word Wrap kata and was wondering which one would be considered the "best". I should have calculated the mass of these algorithms manually, but I followed Terence Parr's advice, to avoid working by hand five days what I can spend five years of my life automating. I had to create a tool to calculate the mass of Java code - which of course took me much longer, especially as I verified the mass of each algorithm manually to make sure my code worked as expected.

Absolute Priority Counter
I created abpricou, the ABsolute PRIority COUnter. It parses Java source and collects its mass as defined by the Absolute Priority Premise. It is written in Python, because it was Python month when I started working on it. It uses the Python Java Parser plyj. plyj offers a Visitor API for parser events,
class CountingVisitor(m.Visitor):
    def __init__(self):
        super(CountingVisitor, self).__init__()
        ...
Constants and bindings are staight forward, e.g.
    def visit_Literal(self, literal):
        return self._a_constant_value()

    def visit_FormalParameter(self, parameter):
        return self._a_name()
Literals in the code are counted as constants and parameters are only names for values, thus bindings. A method or function counts as a constant for the code it represents and a binding for its name.
    def visit_MethodDeclaration(self, declaration):
        self._a_name()
        return self._code_is_a_constant()
Invocations, conditionals and loops are the same. For example
    def visit_MethodInvocation(self, invocation):
        return self._an_invocation()

    def visit_IfThenElse(self, conditional):
        return self._a_conditional()

    def visit_While(self, loop):
        return self._a_loop()
The only interesting case is the assignment. Not every assignment in Java is counted as an assignment, only re-assignments which modify values are counted. final fields or local variables are just names for expressions similar to parameters.
    def visit_Assignment(self, assignment):
        if ... :
            # code to ignore
            # final field is assigned in constructor
            # final local is assigned in block
        else:
            return self._an_assignment()
View the complete source of the Absolute Priority Counter on Bitbucket.

Let's see some code: Word Wrap Kata
As I said before, I developed the counter to calculate the mass of my different implementations of Word Wrap. I was interested in the mass of the algorithm and skipped constructs like class definitions. (I suppose a class definition is a constant for the code and a binding for the name.) I also ignored Annotations, Enumerations and Generics. The first algorithm uses recursion to loop over the blanks to find the end of each line. Its code is
   final char BLANK = ' ';
   final char NEWLINE = '\n';

   String wrapRecursive(String line, int maxLineLen) {
      if (line.length() <= maxLineLen) {
         return line;
      }

      int indexOfBlank = line.lastIndexOf(BLANK, maxLineLen);
      int split;
      int offset;
      if (indexOfBlank > -1) {
         split = indexOfBlank;
         offset = 1;
      } else {
         split = maxLineLen;
         offset = 0;
      }
      return line.substring(0, split) + NEWLINE
           + wrap(line.substring(split + offset), maxLineLen);
  }
including the components
| Component   | Mass  | Count |
| constant    |   1   |   7   |
| binding     |   1   |   8   |
| invocation  |   3   |  10   |
| conditional |   4   |   2   |
| loop        |   5   |   0   |
| assignment  |   6   |   0   |
resulting in a total mass of 53. Further implementations have the following masses:Scales
  • The tail recursive solution has neither loops nor assignments similar to the recursive one and has a total mass of 71. (The source of this and all further variants is given in Word Wrap Kata Variants.)
  • The looping variant contains a loop and an assignment resulting in a mass of 68.
  • The optimised loop which avoids temporary String objects, contains a loop and an assignment and some more invocations. Its mass is 80.
  • The loop using a buffer saves even more heap allocations and needs more invocations summing up to 105.
  • The solution using Regular Expressions as shown in the original article has a mass of 55. Merging the parts of the expression into a single expression would save five invocations, but makes the expression less readable. I argue that the Regular Expression itself has some weight, as it contains one conditional ("|" in line 3) and two loops ("(.{1," + maxLineLen + "})" in lines 1 and 4). So a fair weight is 69.
  • The functional version using flatMap and reduce is quite verbose due the lack of inferred pair types in Java. As plyj does not support Java 8 I was unable to measure its mass.
First Conclusion on Algorithms
The most basic, recursive version of Word Wrap has the least weight. What does it mean? Is it the best version? It is the most compact version without playing Code golf, at least in Java. It does not mutate any variables but it puts the highest load on the Garbage Collector. All the discussed algorithms have different memory and run time performance characteristics. I had hoped for a clear answer what the best version would be and I am not seeing that. I challenge you to leave a comment with a version of Word Wrap with a smaller weight. Would it be considered better code?

The mass of the basic loop version seems too much compared to the recursive version. The current weights favour functional programming by putting a penalty on loops and assignments. Mutating a local variable has a smaller weight than mutating the state of an object. The looping version uses StringBuilder instead of plain String + which needs two more invocations for its construction. Then its components are
| Component        | Mass  | Count |
| constant         |   1   |   8   |
| binding          |   1   |  10   |
| local assignment |   1   |   3   | like a new local binding
| invocation       |   3   |  10   |
| conditional      |   4   |   1   |
| loop             |   5   |   1   |
| assignment       |   6   |   0   |
and its mass is 60 which is a bit more code than the recursive solution.

More Structure
The different wrap() functions above are just algorithms. They are not factored into parts that might change at different speeds over time and definitely violate the SRP. They also break the OCP as it is impossible to change the strategy of splitting without touching the logic of collecting. To address this I wrote another recursive version following Tell, don't ask,
interface Page {
   void renderLine(String lineOfProperLength);
}

class Wrapper {

   final char BLANK = ' ';

   final Page page;
   final int maxLineLen;

   Wrapper(Page page, int maxLineLen) {
      this.page = page;
      this.maxLineLen = maxLineLen;
   }

   void wrap(String line) {
      if (line.length() <= maxLineLen) {
         page.renderLine(line);
         return;
      }

      int indexOfBlank = line.lastIndexOf(BLANK, maxLineLen);
      int split;
      int offset;
      if (indexOfBlank > -1) {
         split = indexOfBlank;
         offset = 1;
      } else {
         split = maxLineLen;
         offset = 0;
      }

      page.renderLine(line.substring(0, split));
      wrap(line.substring(split + offset));
   }
}
which is very similar to the tail recursive solution above, with a total weight of 56. The number is smaller because it does not contain the necessary implementation of Page.renderLine(). The shown code does less, which makes it difficult to compare. On the other hand Page.renderLine() might be used by other parts of the code as well, so it is not strictly a part of Word Wrap.

Moar Structure!1!!
Later I created an extremely factored and decomposed implementation of Word Wrap that separated concepts of rendering, splitting, accumulating lines and hyphenation rules which should fulfil both SRP and OCP. Ignoring the HyphenationRule because that is not covered by the other solutions, its mass is 167 due to many method invocations and parameter names:
| Component   | Mass  | Count |
| constant    |   1   |  17   |
| binding     |   1   |  33   |
| invocation  |   3   |  30   |
| conditional |   4   |   4   |
| loop        |   5   |   1   |
| assignment  |   6   |   1   |
There is a loop and an assignment, at its core this implementation is a looping one. Using the recursive solution I should be able to get rid of the loop and the assignment.

Conclusion
The Absolute Priority Premise counts the different components of a program and sums their weights. If the weights are correct than the program with the smallest mass would be considered the "best" program. This is not true for algorithms like the Word Wrap because the APP ignores features like memory usage or performance optimisation. For general purpose code the validity of the APP is unclear. It just measures the number of code constructs, favouring more compact, functional code. On the other hand the four rules of simple design encourage us to introduce explaining variables to reveal the intent of the code which is more important than fewer elements.

No comments: