21 June 2020

Conway's Squasher Coroutine

Since some time I am researching coroutines. Wikipedia says that a coroutine is a computer program component that generalises subroutines, allowing execution to be suspended and resumed. [...] Subroutines are special cases of coroutines. When subroutines are invoked, execution begins at the start, and once a subroutine exits, it is finished; an instance of a subroutine only returns once, and does not hold state between invocations. By contrast, coroutines can exit at any time, and execution may later return to these points. [...] A coroutine instance holds state, and varies between invocations. Coroutines are useful for many things. For example the Kotlin programming language lists several uses cases including asynchronous computations, generators and implementing state machines in a sequential way. More and more languages provide them.

The Paper
According to Donald Knuth, Melvin E. Conway coined the term coroutine in 1958 when he applied it to the construction of an assembly program. He first published it 1963 in his paper Design of separable transition-diagram compiler. Conway states that the coroutine idea was concurrently developed by him and by Joel Erdwinn. The paper is about implementing a COBOL compiler, the first two pages introduce the coroutine together with its implementation in assembly for the Burroughs model 220, a late vacuum-tube computer.

Code Example
Conway shows an example, the asterisk squasher subroutine, in both subroutine and coroutine form. The mentioned assembly commands are unknown to me and the flowcharts are not 100% readable, so I decided to reimplement the example in modern assembly to see what is going on. The asterisk squasher is a program which reads cards and writes the string of characters it finds, column 1 of card 1 to column 80 of card 1, then column 1 of card 2, and so on, with the following wrinkle: every time there are adjacent asterisks they will be paired off from the left and each "**" will be replaced by the single character "^". This operation is done with the exponentiation operator of FORTRAN and COBOL.

Asterisk Squasher Subroutine
The problem is that out - the output area of the SQUASHER routine - only holds a single byte, which means the routine can only return and subsequentially write a single character. When looking for the second asterisk, the routine reads a second element and thus needs somehow to return it on the next call unless it was a "*". Conway continues that the simple requirement of compressing asterisks requires the introduction of a switch which bifurcates the subroutine and selects the part to be used at each call depending on the situation at the last call. The reason for the switch is that each call of the subroutine must result in output of exactly one character. The second read character is stored in a (module) global variable t2 until the next call. Here is its flowchart:

Figure 1: Asterisk squasher subroutine with environment
I use my trusted Windows assembly setup for Intel IA-32 assembly. The assembly is in NASM format. First define the data:
card_len equ    80

switch: resd    1
%define ON      dword 1
%define OFF     dword 0

i:      resd    1
card:   resb    card_len

t1:     resd    1
t2:     resd    1

out:    resd    1
Let's warm up with the code for RDCRD, which reads 80 characters of a card. The code strictly follows the flowchart above.
RDCRD:
        mov     eax, [i]
        cmp     eax, card_len
        jne     .exit

        mov     [i], dword 0

        ; Omitted using Windows system library functions
        ; _GetStdHandle@4 and _ReadFile@20 to read 80
        ; characters into [card].

.exit:
        ret
The main method is SQUASHER with the switch variable.
SQUASHER:
        mov     eax, [switch]
        cmp     eax, OFF
        je      .off
.on:
        mov     eax, [t2]
        mov     [out], eax

        mov     [switch], OFF
        jmp     .exit

.off:
        call    RDCRD

        mov     esi, [i]
        xor     eax, eax
        mov     al, [card + esi]
        mov     [t1], eax

        inc     esi
        mov     [i], esi

        mov     eax, [t1]               ; redundant, value still in register
        cmp     eax, '*'
        jne     .not_equal_ast

.equal_ast:
        call    RDCRD

        mov     esi, [i]                ; redundant, value still in register
        xor     eax, eax
        mov     al, [card + esi]
        mov     [t2], eax

        inc     esi
        mov     [i], esi

        mov     eax, [t2]               ; redundant, value still in register
        cmp     eax, '*'
        jne     .not_equal_second_ast

.equal_second_ast:
        mov     [t1], dword '^'
        jmp     .not_equal_ast

.not_equal_second_ast:
        mov     [switch], ON
        jmp     .not_equal_ast

.not_equal_ast:                         ; label 1
        mov     eax, [t1]
        mov     [out], eax

.exit:
        ret
There is nothing special happening in the code above. There are the two main branches .on and .off. To run the whole program on a single test card I need a WRITE and a main method. WRITE iterates a card and prints the result of each call to SQUASHER to standard out.
WRITE:
.loop:
        call    SQUASHER

        ; Omitted using Windows system library functions
        ; _GetStdHandle@4 and _WriteFile@20 to write 1
        ; character from [out].

        mov     eax, [i]
        cmp     eax, card_len
        jne     .loop

        ret

; ----------------------------------------
        global  _main
_main:
        ; set up switch
        mov     [switch], OFF

        ; set up global data
        mov     [i], dword card_len

        call    WRITE

        push    0
        call    _ExitProcess@4
Asterisk Squasher as Coroutine
Conway writes further in his paper: The coroutine approach to the same problem accomplishes the switching job implicitly by use of the subroutine calling sequence. When coroutines A and B are connected so that A sends items to B, B runs for a while until it encounters a read command, which means it needs something from A. Then control is transferred to A until it wants to "write," whereupon control is returned to B at the point where it left off. In this situation A is the SQUASHER and B is the WRITE coroutine. The following flowchart shows SQUASHER when both it and the using program (WRITE) are coroutines.

Figure 2: Asterisk squasher as coroutine
Coroutine in Assembly?
While some architectures provide support for coroutines, according to the Art of Assembly Language 80x86 CPUs do not provide a COCALL instruction. Conway describes the coroutine linkage on the Burroughs 220 as follows: The UNCONDITIONAL BRANCH instruction BUN A works by placing its address A into the P-register (the Instruction Pointer). The STORE P instruction STP B places the contents of P plus one into the address part of the contents of location B. The standard subroutine call is
1:      STP EXIT                        ; store 3 into target address of address EXIT
2:      BUN ENTRANCE                    ; jump to address ENTRANCE
3:
where location EXIT contains a BUN instruction whose address will be altered by the STP instruction in the call. (So this is self-modifying code and the return of the subroutine is then EXIT: BUN 3 or something like that.) A pair of subroutines becomes a pair of coroutines by adding to each an isolated BUN instruction which we can call its router, and by changing the addresses of the STP and BUN calls as follows: when coroutine A calls coroutine B the call is
        STP AROUTER
        BUN BROUTER
That seems a small change for the Burroughs but how could I recreate that?

Coroutine in Assembly!
The BUN is a JMP on Intel. Unfortunately not all instructions are one byte, so using the current IP + 1 as done by STP does not work. I tried to follow the original idea with a short macro. When coroutine A calls coroutine B the call is coro A B,
%macro coro 2
        mov     ebx, %%_cnt             ; basically IP + 1
        mov     [route_%1], ebx         ; store it in target of router A
        jmp     %2                      ; jump to router B
%%_cnt: nop
%endmacro

route_SQUASHER: resd 1
route_WRITE:    resd 1

SQUASHER:
        jmp     [route_SQUASHER]

WRITE:
        jmp     [route_WRITE]
This needs storage for the route of each coroutine. The call of the macro does not look so good as it needs the name of the current coroutine as first argument for the macro to use the right route storage. If each coroutine would be its own module, I could drop that and just use [route]. The router variables [route_*] will be initialised at startup. The code of the two routines SQUASHER and WRITE changes slightly, strictly following the second flowchart above.
; router
SQUASHER:
        jmp     [route_SQUASHER]

; coroutine
SQUASHER_CORO:                          ; label 1
        call    RDCRD

        mov     esi, [i]
        xor     eax, eax
        mov     al, [card + esi]
        mov     [t1], eax

        inc     esi
        mov     [i], esi

        mov     eax, [t1]               ; redundant, value still in register
        cmp     eax, '*'
        jne     .not_equal_ast

.equal_ast:
        call    RDCRD

        mov     esi, [i]                ; redundant, value still in register
        xor     eax, eax
        mov     al, [card + esi]
        mov     [t2], eax

        inc     esi
        mov     [i], esi

        mov     eax, [t2]               ; redundant, value still in register
        cmp     eax, '*'
        jne     .not_equal_second_ast

.equal_second_ast:
        mov     [t1], dword '^'
        jmp     .not_equal_ast

.not_equal_second_ast:
        mov     eax, [t1]
        mov     [out], eax
        coro    SQUASHER, WRITE

        mov     eax, [t2]
        mov     [out], eax
        coro    SQUASHER, WRITE

        jmp     SQUASHER_CORO

.not_equal_ast:                         ; label 2
        mov     eax, [t1]
        mov     [out], eax
        coro    SQUASHER, WRITE

        jmp     SQUASHER_CORO

; ----------------------------------------
; router
WRITE:
        jmp     [route_WRITE]

; coroutine
WRITE_CORO:
.loop:
        coro    WRITE, SQUASHER

        ; Omitted using Windows system library functions
        ; _GetStdHandle@4 and _WriteFile@20 to write 1
        ; character from [out].

        mov     eax, [i]
        cmp     eax, card_len
        jne     .loop

        ret

; ----------------------------------------
        global  _main
_main:
        ; set up coroutine routers
        mov     ebx, SQUASHER_CORO
        mov     [route_SQUASHER], ebx

        mov     ebx, WRITE_CORO
        mov     [route_WRITE], ebx

        ; set up global data
        mov     [i], dword card_len

        call    WRITE

        push    0
        call    _ExitProcess@4

Conslusion
I hope that clarifies how the first coroutine presented in 1963 worked. Re-coding the example was a lot of work and in the end it paid off because everything is clear now, even more so after I wrote this blog post about it. This is the golden triangle of learning: theory (e.g. reading about it), practice (e.g. creating it) and finally teaching (documenting and sharing it).

Full Source Code
After I published this article Dmitry Kandalov pushed me to make the code available on GitHub. And then we went to port it to macOS, so he could play with it as well. To compile and run the code you need GNU make, NASM and a linker depending to your platform. Thanks to Samir Talwar's Smoke, it even comes with tests. Give it a try: Conway's Squasher Coroutine @ GitHub.

12 June 2020

A New Dog

Last year our dog Ruby died. We took some time to cope with the loss and now is the time for a new dog. After we loved a Great Dane once, there is no other dog we want. Here is a picture of our new puppy Ronja:

Great Dane puppy Ronja
While I chose the name of our previous dog - Ruby - my wife chose the name of this one. Ronja or Ronia is the daughter of the leader of a clan of robbers. Obviously she has little manners and does what she likes - and that is exactly the nature of our little Ronja ;-)

Raising a Puppy
Like software development, raising a puppy is a detailed process which can be improved continuously. Today we know more than ten years ago and we want to do much better. There are many misconceptions how to raise dogs and I learnt again that knowing how to handle certain situations is not enough - I have to practice them. For example, grabbing the puppy on time so it will not pee on the carpet is a skill. If you ever plan to raise a puppy I recommend to check out the books by Ian Dunbar, BEFORE You Get Your Puppy and AFTER You Get Your Puppy. These are small books, fun and easy to read. Both books are provided as e-books courtesy of simpawtico dog training in the downloads section. And even if you do not own any dogs, reading these books will help you understand why dogs behave how they do - especially if they have been trained badly.

BEFORE You Get Your Puppy book cover AFTER You Get Your Puppy book cover

4 June 2020

Flashcards and Microlearning

I like learning in all its aspects. A large part of software work is hands-on, so I experiment a lot and work through old and new code katas alike. For theory I rely on classic books like Clean Code. Sometimes I struggle to internalise theoretical knowledge and take notes, create extracts, mind maps or sketch notes. For material I need to memorise I like (digital) flashcards.

Applications of Flashcards
I use flashcards for a lot of different things.
  • The common case is to study tiny facts like words or numbers. IDE shortcuts and spelling alphabets are in this category. These flashcards work very well for me.
  • Then I tried visualising keys, which did not work as well.
  • I also use them to remember key points from books, e.g. The Little Schemer or Gerald M. Weinberg's Secrets of Consulting. Key learnings from books, i.e. phrases or sentences, are more difficult to remember. I am still experimenting with these - maybe flashcards are not ideal here.
  • In the past I created decks of cards to accompany some of my training workshops, e.g. Design Patterns. While participants were enthusiastic about them, I am pretty sure they did not use them to deepen their newfound knowledge. People use different phones, so the availability of flashcard apps is also an issue.
  • Even when I fail to study the cards later, creating them is a learning experience on its own. I have to collect and structure the material, formulate the questions and find precise answers. For example I created cards around Coupling and Cohesion which helped me understand more of these two concepts.
Micro GardenMicrolearning
By researching flashcards, I came across Microlearning. This is kind of a buzz word - everything is micro today, e.g. Micro-Workouts. (You practice for five minutes a day and then the exercise gear folds under your bed ;-) Wikipedia says that Microlearning deals with relatively small learning units and short-term learning activities. [...] In a wide sense, Microlearning can be understood as a metaphor which refers to micro aspects of a variety of learning models, concepts and processes. It is a new concept and there is no clear definition. It contains a lot of things. Besides the obvious reading, listening or watching short pieces of information, Microlearning activities include flashcards, quizzes, answering multiple-choice questions, micro games and more.

Conclusion
I was surprised to learn that sorting and organising learning content like tagging it (e.g. Social Bookmarking) is considered Microlearning. I like sorting and connecting information, I even sort my code katas. The most fun activity listed on Wikipedia is composing a haiku or a short poem. I did compose a poem in the past but never considered it a tool for learning. I like the idea. Maybe I will write a poem about TDD or Micro Services in the future.

I do not consider a whole deck of flashcards Microlearning. A whole deck is so much more as it contains all the aspects and details about a certain topic. In addition, creating it takes several hours. I am seeing that because I create most of my decks myself. Probably I am putting too much information into them as well. Sometimes I am overwhelmed by my own questions. For example my Design Patterns deck contains more than 360 questions including class diagrams and all. On the other hand, learning a few cards now and then - especially from an unknown deck - is fine.

Listening and watching short audio or video recordings is Microlearning, too. I like to listen to podcasts, especially if each episode is focused and not too long. That means it is shorter than ten minutes. Short episodes are easy to consume between other activities, e.g. when commuting. It is easier to stay focused. (Yes, maybe I am getting old. I have less time and energy for prolonged learning.) That is the reason I plan for episodes of eight minutes in my own podcast on Coderetreat Facilitation. Short episodes are easier to produce, which allows me to publish more often.

Bonus Material
The German Wikipedia entry on Microlearning contains a paragraph on Microlearning in the context of software development. It lists Test Driven Development as an example. This is an interesting angle. When the test - an assumption - goes green, the implementation moved forward a micro step and the developer knows a tiny bit more about the system.