7 August 2015

How to Unit-Test Assembly

now i just need a pirate and a waterfall...After my last trip into Assembly I got the feedback that I should have used TDD, especially as it was supposed to be a code kata. (Thank you Emmanuel Gaillot for reminding me of my duties.) And yes, I should have. But as I said, I could not find any real unit testing support, at least not without resorting to C or C++ unit testing frameworks. On the other hand, creating an xUnit implementation is a great exercise to get to know a language. So I decided to create my own, minimal unit testing framework for Windows IA-32 Assembly using NASM. (The following code uses stdcall call convention because it is used in the Microsoft Win32 API anyway.)

Failure
So what are the most required features for unit testing support? I believe the most important feature is to mark a test as failed. I also want to stop executing this test in case of failure, like JUnit does. This is the fail() method,
%macro fail 0
        log msg_failed, msg_failed_len

        leave
        ret 0 ; test method never has any arguments
%endmacro

msg_failed:     db 'FAILED'
msg_failed_len  equ $ - msg_failed
In fact it is a NASM macro which gets copied into each invocation. For simplicity, it does not support a message argument right now, hence zero macro parameters. It uses the log macro, which prints the given string to Standard Out, just like _printGrid of my Assembly Minesweeper. Then it resets the stack frame (leave) and returns from the current function. I expect all test methods to have no arguments, and fail's code gets copied into each test method, this skips further execution of the test method and returns immediately to the calling code, i.e. the test runner. Adding the count of failed tests in fail is straight forward but I will leave that for later.

Assertion
Next I need assertions to check my expectations, at least an assert_equals,
%macro assert_equals 2
        cmp     %1, %2
        je      .%1_%2_end
        fail
.%1_%2_end:
        call    _show_progress
%endmacro
This has to be a macro as well so fail works correctly as explained above. But as the macro gets expanded into the calling code, the local label .%1_%2_end has to be unique. While different assertions are possible, a specific assertion like assert_equals eax, 2 can only be used once per test method. This is a weird constraint but I do not mind as my tests tend to have a single assertion per test method anyway. (Later I changed the macro to use a macro local label %%_end which removed this problem.)

The function _show_progress just prints a single dot to show the progress during test execution. It needs to preserve all registers because it is called in the middle of my test code.
_show_progress:
        push    eax
        push    ebx
        push    ecx
        push    edx

        push    dot_len
        mov     eax, dot
        push    eax
        call    _print

        pop     edx
        pop     ecx
        pop     ebx
        pop     eax

        ret

dot:    db      '.'
dot_len equ     1
Test Cases
For my test cases I want descriptive names. Using long, regular labels for function names and the assertion macros I write my first test
_should_add_one_and_one_to_be_two:
        create_local_variables 0

        mov     eax, 1
        add     eax, 1
        assert_equals eax, 2

        leave
        ret
The test succeeds and prints a single dot to Standard Out. The create_local_variables macro creates the stack frame and local variables if needed. If a test fails, e.g.
_should_add_one_and_two_to_be_three:
        create_local_variables 0

        mov     eax, 1
        add     eax, 2
        assert_equals eax, 2 ; (6)

        ; not reached
        hlt

        leave
        ret
it prints FAILED and stops execution in line 6.

Test Runner
Finally I want to run all my tests one by one. In this simple case I just call the test methods from the main entry point, followed by printing DONE at the end.
global  _main

_main:

        ; welcome message
        log     msg_hello, msg_hello_end - msg_hello

        call    _should_add_one_and_one_to_be_two
        call    _should_add_one_and_two_to_be_three
        ; (10)

        ; completion message
        log     msg_done, msg_done_end - msg_done

        jmp     _exit

msg_hello:      db 'HELLO asmUnit'
msg_hello_end:

msg_done:       db 'DONE'
msg_done_end:
Macro Metal CatThis gets the job done but I am not happy with it. I need to add new test method invocations by hand in line 10. By seeing the new test fail I cannot forget to add the call of the new test method, but automatic test case discovery would be more convenient.

Before and After
What else does a unit testing framework need? For complex cases, before and after test execution hooks might be necessary. A before macro defines the proper label and the begin_test macro just calls it.
%macro before 0-1 0
        %ifdef ctx_before
                %error "before used more than once"
        %endif

        %define ctx_before, 1
_before_hook:
        create_local_variables %1
%endmacro

%macro begin_test 0-1 0
        create_local_variables %1
        %ifdef ctx_before
                call _before_hook
        %endif
%endmacro
The after and end_test macros work accordingly, just for the test clean-up. The actual test code stays the same, the macro calls just changed.
_should_add_one_and_one_to_be_two:
        begin_test

        mov     eax, 1
        add     eax, 1
        assert_equals eax, 2

        end_test
While this works well, I am unhappy with all the (NASM specific) macros in the code. There remains little real Assembly, but I guess that is the way to go. There are also plenty of missing features but this minimal first version gets me started. I will see how it works out in my next Assembly kata. (The complete source of asmUnit is here.)

31 July 2015

External Code Reviews

As Code Cop, I am sometimes asked to perform an independent code review. Before I start working on it, I ask for the goal of the review. What kind of results are expected and what will happen with these results? Which decisions will be taken depending on the outcome? Usually people know when the quality of their work is not that great, so why bother?

For example I was asked to review a code base because the developers had problems with their client. They were arguing about costs and regressions. I did not approve this reason, because in such a situation there is always a loser. When doing a code review, I do not want to shift blame, I want to make people aware of potential improvements and help developers learn and grow. So I persuaded the client to change the goal of the audit. Instead of looking for blame, I worked together with the whole team to come up with a list of necessary steps to correct their customer's problems. Understanding the way that software grows over time, it was reasonable to charge the client for at least some of that work. It became a win-win situation.

A Code Review is an Opportunity to Improve
So the goal of the review is highly relevant for the mechanism and success of such an external code audit. There must be a dialogue with the team and the audit's findings must be used for constructive feedback. That means discussing the results with development and creating a plan how to fix the critical issues. My findings are always concrete and raw - I do not like to create management summaries of elaborate slide decks. (When such reports were needed, development managers worked with me to create the slides they needed from my raw results.)

Sometimes I start a coaching engagement with a quick review of the team's code. It helps me to see the team's maturity and typical problems. With this knowledge I can target the top issues in Coding Dojos right from the beginning. Also when I help teams during re-engineering efforts, at least a partial review of the code base helps me to understand the problems we try to solve.

But Is It Practical?
Checking the code quality is difficult. I am not able to look at every line in the system, that is impossible. So I use static code analysis tools and metrics to get an idea about the code. But metrics are controversial because most of them do not measure what I am really looking for - clean code which is readable and can be maintained easily. The actual review of code is always based on samples. This is far from ideal.

An outsider can never know all technologies and reasons why a large piece of software is like it is today. That is why the developers are essential. To get usable results I am relying on them. I ask them about their code, let them create lists of their best and worst classes and discuss what I find and why I do not like it. To work like that, the audit must be a friendly action, performed in cooperation with the developers. When the goal of the review is to improve the whole project and developers are asked for their input right from the start, everybody is on board.

One Does Not Simply Review 720000 LoCWhat I do
For a code review of a large code base, e.g. 500 to 800k lines of code, I try to get as much information about the code as possible. First I read about the used technologies if I am not familiar with them. Then I try to build the complete project and play around, opening classes randomly and following references. This is just warm up for getting used to the code base.

When I feel comfortable in the code, I start with the heavy lifting: First I run some tools to get metrics, e.g. JavaNCSS to collect the size and complexity of the code, or Chidamber and Kemerer to calculate coupling and cohesion numbers. Then I use tools to scan for smelly code or potential bugs, e.g. Checkstyle, PMD and others. These tools are very common and it happens that they do not find much - because the developers already use them - which is a great sign. But unfortunately it only happened once till now. Then I move from these line based analysis tools to higher level ones. I look for violation hotspots, code duplication, unused classes, Singletons (because I do not like them) and cyclic dependencies to name a few. There are many tools for Java but depending on the programming language, there might be less tools available. Anyway I still try to use at least one of each category.

The hard work is the manual part. I verify the critical findings of all tools, which includes a lot of navigation. Then I run some semi-automatic analysis, usually by searching. I look for compiler warnings, TODO markers, @SuppressWarnings, (too many) casts and instanceofs, catch blocks, ignored tests, useless JavaDoc comments and other things. As I check each finding, I have covered a lot of code so far - although somehow out of context. Finally I select a small sample of classes and read them from top to bottom.

In the meantime, I schedule time to talk to the developers. I ask them about known issues and where I should look in particular. As I said, people usually know where the skeletons are hidden. I ask them to show me how they build their software, how they use Continuous Integration and if they have coding and design conventions. If they have time I ask them to run some of the analysis tools I mentioned above and remove false positives. I do not have to do everything myself and every team needs someone who is familiar with these tools to use them from time to time.

Before I report the result to the customer, usually a manager of some kind, I present my findings to the developers and discuss my conclusions. I explain the results in detail until they fully agree with me. Sometimes I drop findings if the team explains their conventions. I make sure that my final report only contains real issues and has full support of the development team. Usually the team already starts working on the critical issues before the audit is officially concluded.

20 July 2015

Write the worst code you can

Global Day of Coderetreat 2014
Last Code Retreat I was lucky to co-facilitate the event together with Alexandru Bolboaca from Romania. He wrote a summary of the event in Vienna in his blog, so I will not describe it myself. I rather describe one constraint he introduced. A constraint, also known as an activity, is a challenge during a kata, coding dojo or code retreat designed to help participants think about writing code differently than they would otherwise. I have written about other constraints before, e.g. No naked primitives.

The Fun Session
The last session of a Code Retreat is usual a free session that is supposed to close a great day and it should be fun. Alex liked to gives participants many options and he came up with a list of different things they could do:Alex Bolboaca Fun SessionWhile I was familiar with all the pairing games, e.g. Silent Evil Pairing, which is also called Mute with Find the Loophole, I had never tried inverting a constraint.

Write the worst code you can
Alex explained that there were many ways to write bad code. All methods could be static, only mutable global state, bad names, too big or too small units (methods, classes), only primitives, only Strings, to just name a few. Some of these bad things are available as their own (inverted) constraints, e.g. all data structures must be hash tables. (PHP I am looking at you ;-) Bad code like this can get out of hand easily, imagine writing the whole Game of Life in a single, huge method.

But how could this teach us anything? A regular constraint is an exaggeration of a fundamental rule of clean code or object oriented design. People have to figure out the rule and work hard to meet the constraint. The inverted constraint is similar. People have to think about what they would do, and then do the opposite, exaggerating as much as possible. As a bonus, most of them get stung by the crap they just created a few minutes ago. And most developers really enjoy themselves working like that. Maybe they have too much fun, but I will come back to that later.

Armament... or maybe not
I used the constraint in several in-house Coding Dojos when the teams asked for fun sessions. Here is what I learned: The constraint is not hard. Some people get creative but in general it is too easy to write bad code. Most "worst" code I saw had inconsistent naming, a lot of duplication, primitives all over the place and bad tests. In general it looked almost like the day to day code these teams deliver. I particularly loved a dialogue I overheard, when one developer wrote some code and proclaimed it as ugly while his pair denied its ugliness.

One participant complained that he wanted to learn how to write clean code and that he saw this worst case code everyday anyway. He was really frustrated. (That is why I said not all people like this constraint.) And this is also pointing out a potential problem of inverted constraints in general - practising doing it wrong might not be a good idea after all.

Conclusion
I am unable to decide if this constraint is worth using or not. I like the idea of inverted constraints and this one is definitely fun and probably good to use once in a while. But I believe the constraint is lacking focus and is therefore too easy to meet. People should not deliver code during a practise session that just looks like their production code. I want them to work harder ;-) I will try more focused, inverted constraints next. Allowing only static methods might be a good start. On the other hand, code created when following the constraint to only use Strings as data structures is too close to the regular (bad) code I see every day. Therefore a good inverted constraint would be one that takes a single, bad coding style to the extreme. Allowing only one character for names comes to my mind, which is an exaggeration of bad naming - or even a missing tool constraint, because it renders naming non-existing.

Have you worked with inverted constraints? If so, please share your constraints and opinions about them in the comments below.