6 November 2015

Bank-OCR Kata in Scheme

Why is it when I wanted to do something completely different from work to relax - I end doing code katas in Scheme? (Last week on Twitter.)

SoCraTes Belgium
Two weeks ago I attended SoCraTes Belgium, the Belgian branch of the Software Craftsmanship and Testing family of (un-)conferences. Being an un-conference the complete agenda of both days was created by the participants. Early during the first day, a participant proposed a session to work through the exercises of the well known SICP book using the Scheme programming language. We worked through the exercises as a mob and it was a lot of fun so another session was scheduled for afternoon. Time and again Scheme sessions were scheduled outside the official agenda. In the end I had spent several hours playing with Scheme and I really enjoyed it.

Why Scheme?
Later I started doing code katas in Scheme, which surprised me, see my initial quote. I do not know why I chose Scheme. There were other programming languages I had planned to learn. Maybe, as my friend Thomas remarked, I chose Scheme because it is seldom used in commercial projects, at least in my surroundings. As it is far away from anything I might touch during my regular work, it is easier to have fun with.

Unit Testing
Nevertheless I wanted to follow my typical development process, using TDD and looked for recommended unit testing frameworks. As minimalism is in the spirit of Scheme, I went with the Scheme Unit outlined on Cunningham's Wiki,
(define (report-error msg)
    (error (string-append "AssertionError: " msg)))

(define (assert msg b)
    (if (not b)
        (report-error msg)))
and added assertions whenever I needed them.

Banking the LightBank OCR
Today I want to share my take on the Bank OCR Kata using Scheme. The Bank OCR assignment is to parse files containing lists of numbers written in LCD style using only pipes and underscores. Each number has nine digits, all of which are in the range one to nine. I knew the Bank OCR kata and had done it before using different languages like Java or C#. I was familiar with the domain which allowed me to focus on functional programming in Scheme.

In the past I used the Bank OCR kata especially to practise the outside-in way of development. Using this approach you build the system from the "outside-in", following the user interaction through all the parts of the system. You start with the interactions and collaborators up-front, especially those at the top level and create fake implementations or mock necessary dependencies. With every finished component, you move to the previously mocked collaborators and implement them. See Emily Bache's article on Outside-In development for a discussion of Outside-In both with London school and classic TDD.

Outside-In vs. Functional?
So whenever I did the Bank OCR kata I tried to follow strict outside-in. But this time I wondered if the outside-in approach was feasible when using a functional language? As far as I knew the typical way of functional programming was to compose small functions to more powerful ones, which naturally lent itself to the bottom-up or classic approach. I was curious how these two would match, if at all.

The Guiding Test
Following Double Loop TDD I started with failing guiding test to parse a single number containing all possible digits,
(define all-digits (list "    _  _     _  _  _  _  _ "
                         "  | _| _||_||_ |_   ||_||_|"
                         "  ||_  _|  | _||_|  ||_| _|"
                         "                           "))

(assert-list= string=?
              "should parse a single number"
              (list "123456789")
              (bank-ocr all-digits))
which expected that bank-ocr(all-digits) yielded ["123456789"].

How to solve the problem?
Then I started to TDD the top level function bank-ocr.
(assert-list= string=?
              "should return empty list on empty input"
              (bank-ocr (list)))
which created the initial function. Then I tested for a non-trivial case
(assert-list= string=?
              "not sure about the name of the test yet"
              (list "123456789")
              (bank-ocr all-digits))
Big Ben (London School TDD is Outside-In but Classic TDD can be as well.)But how would I solve the problem? I had no idea. Nevertheless, the first step of the algorithm was to split the input into groups of four lines each and another function, e.g. parse-line, would parse the line then. Following outside-in I defined a stub for parse-line and changed bank-ocr to call it.
;; stub
(define (parse-line ocr-line)

(define (bank-ocr ocr-lines)
    (if (null? ocr-lines)
        (list (parse-line ocr-lines))))
The next test forced me to implement the recursion to call parse-line for each group of four lines.
(assert-list= string=?
              "should parse each group of lines"
              (list "123456789" "123456789")
              (bank-ocr (append all-digits all-digits)))

(define (bank-ocr ocr-lines)
    (if (null? ocr-lines)
        (append (list (parse-line (take ocr-lines 3)))
                (bank-ocr (drop ocr-lines 4)))))
Moving "in"
bank-ocr was complete but the guiding test told me that there was no parse-line function in the production code and I knew where to go next.
(define ocr-digit-one (list "   "
                            "  |"
                            "  |"))

;; should split and parse first digit
(assert-string= "1" (parse-line ocr-digit-one))
Parsing a line would need to split the line into digits and then parse each digit. I added another two stubbed functions and built parse-line to get the test green.
;; stub
(define (split-digits ocr-line)

;; stub
(define (parse-digits ocr-digits)
    ;; use assert-list= to check that ocr-digits is ocr-digit-one

(define (parse-line ocr-line)
    (parse-digits (split-digits ocr-line)))
Parenthesis(Actually I was cheating here: I should have checked that the output of split-digits was fed into parse-digits. Nobody is perfect and I will atone for that later, but let's move on for now.) Again a function was finished but I had invented two new collaborating functions to do so.

Another step "outside-in"
Next came testing split-digits to split the three lines into nine digits containing three lines of three characters each.
(define two-ocr-digit-one (list "      "
                                "  |  |"
                                "  |  |"))

;; missing test "should split empty line into no digits"

(assert-list= (list-equals-for string=?)
              "should split single digit"
              (list ocr-digit-one)
              (split-digits ocr-digit-one))

(assert-list= (list-equals-for string=?)
              "should split two digits"
              (list ocr-digit-one ocr-digit-one)
              (split-digits two-ocr-digit-one))

(define (split-digits ocr-line)
    (define (take-3-chars s)
        (substring s 0 3))
    (define (drop-3-chars s)
        (substring s 3 (string-length s)))
    (if (zero? (string-length(car ocr-line)))
        (append (list (map take-3-chars ocr-line))
                (split-digits (map drop-3-chars ocr-line)))))
I did not start with the degenerate test-case that an empty line, a list of three empty strings, should be split into an empty list of digits. I did not add this test because it did not feel right from the solution's perspective. split-digits would always be called with a full line, i.e. three strings of 27 characters each. But as soon as I tried to get the recursion for the second digit right (as forced by the second test), I struggled because I had to figure out the recursion and termination condition at the same time.

ParenthesisA Functional TDD "Pattern"
There is some obvious pattern here. Consider we need a function that operates on a list of inputs and processing of a single input is either simple or can be delegated to another function. Then we need three tests to drive the implementation of that function:
  1. An empty input should produce an empty output, where empty is defined differently for input and output. This drives the creation of the function header and the body of the (future) termination condition.
  2. A single input should produce a single output. This drives the conditional for the termination condition and the processing of a single input. The processing must be simple otherwise the step is too large.
  3. A list of inputs should produce a list of outputs. This test drives the splitting of the first input from the remaining ones for the recursion.

Coming to an end
The second missing function was parse-digits. It was supposed to work on a list of digits, to parse each of them and return the list of parsed digits so I used my three steps from above.
;; should parse empty digits as empty string
(assert-string= "" (parse-digits (list)))

;; stub
(define (parse-digit ocr-digit)

;; test for parsing a single digit omitted

;; should parse digits into numbers for each digit
(assert-string= "111"
                (parse-digits (list ocr-digit-one ocr-digit-one ocr-digit-one)))

(define (parse-digits ocr-digits)
    (if (null? ocr-digits)
        (string-append (parse-digit (car ocr-digits))
                       (parse-digits (cdr ocr-digits)))))
I skipped step two of my list above and omitted the test for parsing a single digit because I felt confident and delegated the actual parsing of a single digit to another function. parse-digit was the final function and compared a given digit against stored digits to determine the number.
(assert-string= "should parse one"
                (parse-digit ocr-digit-one))

(assert-string= "should parse two"
                (parse-digit ocr-digit-two))

;; etc.

(define (parse-digit ocr-digit)
    (let ((digit (apply string-append ocr-digit)))
        (cond ((string=? digit (string-append "   "
                                              "  |"
                                              "  |")) "1")
        ;; etc.
I did not push the final solution of parse-digit. Probably I could remove the duplication using some functional magic, but I had spent some hours already on coding, it was late and I was tired. The full source with history is available in my Scheme kata repository.

Using Scheme was fun. I had to look up library functions a lot and spent some time on Stackoverflow, but I felt progress all the time. I committed on red, green and refactor, on average each ten minutes, and I was never stuck. The minimalist unit testing function gave me enough feedback to move forward quickly. I did not bother for expressive assertion messages because my steps were small and I never looked at the failures anyway - I knew which test would fail or pass next. Solving Bank OCR was straight forward, probably due to the nature of the assignment. Also knowing the solution - which is not the implementation - helped me a lot and I focused almost entirely on Scheme and the functional aspect.

I was able to do outside-in TDD by stubbing future functions. The stubbing was crude, I just redefined the functions in the test code. I was unhappy with this approach but it worked and I lacked in-depth knowledge of Scheme to come up with a proper way to stub functions. It seemed wrong to pass functions around according to Dependency Inversion Principle, because the called functions were low-level internals and no peer collaborators. In a way I followed Ralf Westphal's approach of True Stepwise Refinement, where he stubbed private functions. In the end I thought about deleting (some) unit tests of the internal functions but did not have any conclusive ideas how to do so.

P.S. for Claus
We had agreed to code together and do some serious product development, and again I spent time on weird ideas like Scheme or Assembly. I am sorry.

No comments: