31 July 2015

External Code Reviews

As Code Cop, I am sometimes asked to perform an independent code review. Before I start working on it, I ask for the goal of the review. What kind of results are expected and what will happen with these results? Which decisions will be taken depending on the outcome? Usually people know when the quality of their work is not that great, so why bother?

For example I was asked to review a code base because the developers had problems with their client. They were arguing about costs and regressions. I did not approve this reason, because in such a situation there is always a loser. When doing a code review, I do not want to shift blame, I want to make people aware of potential improvements and help developers learn and grow. So I persuaded the client to change the goal of the audit. Instead of looking for blame, I worked together with the whole team to come up with a list of necessary steps to correct their customer's problems. Understanding the way that software grows over time, it was reasonable to charge the client for at least some of that work. It became a win-win situation.

A Code Review is an Opportunity to Improve
So the goal of the review is highly relevant for the mechanism and success of such an external code audit. There must be a dialogue with the team and the audit's findings must be used for constructive feedback. That means discussing the results with development and creating a plan how to fix the critical issues. My findings are always concrete and raw - I do not like to create management summaries of elaborate slide decks. (When such reports were needed, development managers worked with me to create the slides they needed from my raw results.)

Sometimes I start a coaching engagement with a quick review of the team's code. It helps me to see the team's maturity and typical problems. With this knowledge I can target the top issues in Coding Dojos right from the beginning. Also when I help teams during re-engineering efforts, at least a partial review of the code base helps me to understand the problems we try to solve.

But Is It Practical?
Checking the code quality is difficult. I am not able to look at every line in the system, that is impossible. So I use static code analysis tools and metrics to get an idea about the code. But metrics are controversial because most of them do not measure what I am really looking for - clean code which is readable and can be maintained easily. The actual review of code is always based on samples. This is far from ideal.

An outsider can never know all technologies and reasons why a large piece of software is like it is today. That is why the developers are essential. To get usable results I am relying on them. I ask them about their code, let them create lists of their best and worst classes and discuss what I find and why I do not like it. To work like that, the audit must be a friendly action, performed in cooperation with the developers. When the goal of the review is to improve the whole project and developers are asked for their input right from the start, everybody is on board.

One Does Not Simply Review 720000 LoCWhat I do
For a code review of a large code base, e.g. 500 to 800k lines of code, I try to get as much information about the code as possible. First I read about the used technologies if I am not familiar with them. Then I try to build the complete project and play around, opening classes randomly and following references. This is just warm up for getting used to the code base.

When I feel comfortable in the code, I start with the heavy lifting: First I run some tools to get metrics, e.g. JavaNCSS to collect the size and complexity of the code, or Chidamber and Kemerer to calculate coupling and cohesion numbers. Then I use tools to scan for smelly code or potential bugs, e.g. Checkstyle, PMD and others. These tools are very common and it happens that they do not find much - because the developers already use them - which is a great sign. But unfortunately it only happened once till now. Then I move from these line based analysis tools to higher level ones. I look for violation hotspots, code duplication, unused classes, Singletons (because I do not like them) and cyclic dependencies to name a few. There are many tools for Java but depending on the programming language, there might be less tools available. Anyway I still try to use at least one of each category.

The hard work is the manual part. I verify the critical findings of all tools, which includes a lot of navigation. Then I run some semi-automatic analysis, usually by searching. I look for compiler warnings, TODO markers, @SuppressWarnings, (too many) casts and instanceofs, catch blocks, ignored tests, useless JavaDoc comments and other things. As I check each finding, I have covered a lot of code so far - although somehow out of context. Finally I select a small sample of classes and read them from top to bottom.

In the meantime, I schedule time to talk to the developers. I ask them about known issues and where I should look in particular. As I said, people usually know where the skeletons are hidden. I ask them to show me how they build their software, how they use Continuous Integration and if they have coding and design conventions. If they have time I ask them to run some of the analysis tools I mentioned above and remove false positives. I do not have to do everything myself and every team needs someone who is familiar with these tools to use them from time to time.

Before I report the result to the customer, usually a manager of some kind, I present my findings to the developers and discuss my conclusions. I explain the results in detail until they fully agree with me. Sometimes I drop findings if the team explains their conventions. I make sure that my final report only contains real issues and has full support of the development team. Usually the team already starts working on the critical issues before the audit is officially concluded.

20 July 2015

Write the worst code you can

Global Day of Coderetreat 2014
Last Code Retreat I was lucky to co-facilitate the event together with Alexandru Bolboaca from Romania. He wrote a summary of the event in Vienna in his blog, so I will not describe it myself. I rather describe one constraint he introduced. A constraint, also known as an activity, is a challenge during a kata, coding dojo or code retreat designed to help participants think about writing code differently than they would otherwise. I have written about other constraints before, e.g. No naked primitives.

The Fun Session
The last session of a Code Retreat is usual a free session that is supposed to close a great day and it should be fun. Alex liked to gives participants many options and he came up with a list of different things they could do:Alex Bolboaca Fun SessionWhile I was familiar with all the pairing games, e.g. Silent Evil Pairing, which is also called Mute with Find the Loophole, I had never tried inverting a constraint.

Write the worst code you can
Alex explained that there were many ways to write bad code. All methods could be static, only mutable global state, bad names, too big or too small units (methods, classes), only primitives, only Strings, to just name a few. Some of these bad things are available as their own (inverted) constraints, e.g. all data structures must be hash tables. (PHP I am looking at you ;-) Bad code like this can get out of hand easily, imagine writing the whole Game of Life in a single, huge method.

But how could this teach us anything? A regular constraint is an exaggeration of a fundamental rule of clean code or object oriented design. People have to figure out the rule and work hard to meet the constraint. The inverted constraint is similar. People have to think about what they would do, and then do the opposite, exaggerating as much as possible. As a bonus, most of them get stung by the crap they just created a few minutes ago. And most developers really enjoy themselves working like that. Maybe they have too much fun, but I will come back to that later.

Armament... or maybe not
I used the constraint in several in-house Coding Dojos when the teams asked for fun sessions. Here is what I learned: The constraint is not hard. Some people get creative but in general it is too easy to write bad code. Most "worst" code I saw had inconsistent naming, a lot of duplication, primitives all over the place and bad tests. In general it looked almost like the day to day code these teams deliver. I particularly loved a dialogue I overheard, when one developer wrote some code and proclaimed it as ugly while his pair denied its ugliness.

One participant complained that he wanted to learn how to write clean code and that he saw this worst case code everyday anyway. He was really frustrated. (That is why I said not all people like this constraint.) And this is also pointing out a potential problem of inverted constraints in general - practising doing it wrong might not be a good idea after all.

Conclusion
I am unable to decide if this constraint is worth using or not. I like the idea of inverted constraints and this one is definitely fun and probably good to use once in a while. But I believe the constraint is lacking focus and is therefore too easy to meet. People should not deliver code during a practise session that just looks like their production code. I want them to work harder ;-) I will try more focused, inverted constraints next. Allowing only static methods might be a good start. On the other hand, code created when following the constraint to only use Strings as data structures is too close to the regular (bad) code I see every day. Therefore a good inverted constraint would be one that takes a single, bad coding style to the extreme. Allowing only one character for names comes to my mind, which is an exaggeration of bad naming - or even a missing tool constraint, because it renders naming non-existing.

Have you worked with inverted constraints? If so, please share your constraints and opinions about them in the comments below.

17 July 2015

Using Hamcrest Matchers With Composer

gotta matchTo help my friends of the Vienna PHP User Group to get started with Coding Dojos, I created a project template. Obviously the used programming language was PHP and PHPUnit, the PHP xUnit implementation, was required.

Composer
Setting up a new project almost always starts with dependency management and I used Composer to define my dependencies. Composer was configured by its composer.json,
{
  "name": "codecop/CodingDojo-PHP",
  "description": "Coding Dojo template",
  "license": "BSD",
  "require": {
  },
  "require-dev": {
    "phpunit/phpunit": "4.5.*"
  }
}
After verifying my composer installation with composer --version I downloaded PHPUnit with composer install. PHPUnit and its transitive dependencies were installed into the vendor directory as expected.

PHPUnit
Next I configured PHPUnit with its phpunit.xml,
<?xml version="1.0" encoding="UTF-8"?>
<phpunit xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:noNamespaceSchemaLocation="http://schema.phpunit.de/4.5/phpunit.xsd"
    backupGlobals="false"
    colors="true"
    beStrictAboutTestsThatDoNotTestAnything="true"
    beStrictAboutOutputDuringTests="true"
    beStrictAboutTestSize="true"
    beStrictAboutTodoAnnotatedTests="true"
    verbose="true">

    <testsuites>
        <testsuite name="All Tests">
            <directory suffix="Test.php">test</directory>
        </testsuite>
    </testsuites>
</phpunit>
This told PHPUnit to load all the *Test.php files inside the test directory for test classes. I especially like the beStrictAbout* flags and enabled them all. These flags warn about smelly tests, e.g. test methods without any assertions. I ran PHPUnit with ./vendor/bin/phpunit to verify my setup. It did not show any tests - after all this was a new and empty project. I have seen people creating an alias to run PHPUnit, but I created a (Windows) script phpunit.bat in the local directory with the same effect,
@call "%~dp0vendor\bin\phpunit" %*
Now I was ready to go and wrote some unit tests, e.g.
<?php

require 'Hello.php';

class HelloTest extends \PHPUnit_Framework_TestCase {

    /** @test */
    function shouldReturnHelloName() {
        $greeter = new Greeter();
        $this->assertEquals("Hello Peter", $greeter->greet("Peter"));
    }

}
Hamcrest Matchers
In the Java community Hamcrest Matchers are popular and they even ship with the core JUnit framework. I like Hamcrest because it allows me to write my own matchers, which make assertions much more expressive than plain assertEquals. Luckily there were some ports of it and I was happy to see a Hamcrest PHP port. I added it to composer.json,
"require-dev": {
  "phpunit/phpunit": "4.5.*",
  "hamcrest/hamcrest-php": "1.2.*"
}
and updated my installation with composer install. Hamcrest offers global functions for its matchers, which allow for shorter syntax, especially when matchers are chained together. To enable this global functions, Composer has to auto load the main Hamcrest file, which is configured using autoload-dev in composer.json,
"autoload-dev": {
  "files": ["vendor/hamcrest/hamcrest-php/hamcrest/Hamcrest.php"]
}
Using global functions has some drawbacks and is considered a bad practise in large scale projects. There are different ways to use Hamcrest PHP with Composer without loading the global functions, e.g. see Hamcrest PHP issues at GitHub. For a first time Coding Dojo I wanted to stay with the simplest way to use Hamcrest and kept the global functions.

So I was able to write my unit tests using Hamcrest matchers, e.g.
<?php

require 'Hello.php';

class HelloTest extends \PHPUnit_Framework_TestCase {

    /** @test */
    function shouldReturnHelloName() {
        $greeter = new Greeter();
        assertThat($greeter->greet("Peter"), equalTo("Hello Peter"));
    }

}
While the test above succeeded, PHPUnit refused to show me a green bar. This was because Hamcrest's assertThat was not recognised as assertion and PHPUnit marked the test as erroneous. With a heavy heart I had to remove one of PHPUnit's beStrictAbout flags,
<?xml version="1.0" encoding="UTF-8"?>
<phpunit xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:noNamespaceSchemaLocation="http://schema.phpunit.de/4.5/phpunit.xsd"
    backupGlobals="false"
    colors="true"
    beStrictAboutTestsThatDoNotTestAnything="false"
    beStrictAboutOutputDuringTests="true"
    beStrictAboutTestSize="true"
    beStrictAboutTodoAnnotatedTests="true"
    verbose="true">
That was it. Finally I was ready to rock, matchers included!

(The complete project as zip is here.)

7 July 2015

Minesweeper "Near the Metal"

Minesweeper
The Minesweeper code kata is a little programming exercise focusing on a subset of the well-known Minesweeper game that comes with a certain operation system. Its assignment is to write a program which accepts an input of arbitrary mine fields, with mine locations marked, and produces the full mine field with all hint values calculated. For example, given the following 4 times 4 mine field
4 4
*...
....
.*..
....
, the same mine field including the hint numbers would look like
*100
2210
1*10
1110
Mine SweeperThe Challenge
Recently the local functional programming group, the Lambdaheads, announced their Minesweeper Challenge. Being a functional programming group, most of the presented solutions used Haskell, F# and other functional languages. I participated and showed them my highly factored object orientated solution in Java, which surely looked as strange to them as their Haskell code was for me.

After seeing a solution in plain C, one participant joked that someone should use Assembly to show an even more low-level implementation. I agreed, it seemed like a great idea at that time, almost as cool as implementing Game of Life in XSLT. As usual I could not let got of the crazy idea. After my first steps in Windows 32 Assembly I was ready for a more complex task.

What about testing?
Although there are people test driving their Assembly, I could not find real unit testing support. The official way to go is to use C/C++ unit testing frameworks and write tests against the global functions in the generated OBJ files. I did not want to learn C this day and created (sort of) an integration test by feeding some mine field into Minesweeper's Standard Input and checking the output manually. Yes, I could have created a minimalist unit testing framework, which is a great exercise to get to know a new language by itself and I have done so in the past, just to keep the flow of TDD. I will come back to this topic later. (Thank you Emmanuel for reminding me of my duties ;-)

Getting Feedback
I wanted to get as much feedback as possible. Feedback is critical when working in an environment as unforgiving as Assembly. In the beginning my integration test did not help me and I split the implementation of my Minesweeper into four parts: Reading Input -> Creating Hints -> Preparing Output -> Printing Results. I decided to work backwards so I could use the result printing to give me feedback about the other stages.

Printing Results
In the beginning the mine field's grid would be a constant string.
max_dimension equ 99

; temp data
grid:   times max_dimension * max_dimension db 'x'
width:  dd      4
height: dd      4
Starting with Hello World I incrementally added to the output until _printGrid would print grid line by line to Standard Out. Then I added support for mine fields of any dimension, using width and height.
_printGrid:
        ; local variables, starting memory[ebp - 4]
numberOfBytesWritten equ -1 * register_size

        enter_method 1                  ; stack frame (macro)
        push    esi
        push    edi

        ; get handle for stdout
        push    STD_OUTPUT_HANDLE
        call    _GetStdHandle@4
        mov     ebx, eax                ; hStdOut

.loop_all_lines:                        ; for (edi = height; edi > 0; edi--)
        mov     esi, 0                  ; address of current line
        mov     edi, [height]           ; loop counter

.loop:
        ; write the next line
        ; BOOL WriteFile( hstdOut, message, length(message), &bytes, null);
        push    NULL
        lea     edx, [ebp + numberOfBytesWritten]
        push    edx
        push    dword [width]
        lea     eax, [grid + esi]
        push    eax
        push    ebx                     ; hstdOut
        call    _WriteFile@20

        ; write a new-line
        push    NULL
        lea     edx, [ebp + numberOfBytesWritten]
        push    edx
        push    len_cr
        push    cr
        push    ebx
        call    _WriteFile@20

.next:
        add     esi, max_dimension      ; go down to next line
        sub     edi, 1                  ; edi--
        jnz     .loop                   ; edi > 0

        pop     edi                     ; drop stack frame
        pop     esi
        exit_method
        ret

cr:
        db      13, 10                  ; Windows \n\r
len_cr  equ     $ - cr
The code above just loops all the grid's lines in esi and prints them followed by a Windows new-line each.

Creating Hints
Next I made the grid mutable by moving it into the data segment and put a real mine field into it.
section .bss

max_dimension equ 4
last_field_in_grid equ max_dimension * max_dimension - 1

grid:   resb    max_dimension * max_dimension
width:  resd    1
height: resd    1

...

_main:

        ; temp copy
        mov     [width], dword 3        ; 3x3 inside 4x4 total space
        mov     [height], dword 3

        mov     eax, [fake_grid + 0]
        mov     [grid + 0], eax
        ...

...

; test cases for solving the game
fake_grid:
        db      '---0---0---00000'      ; empty
;        db      'x--0---0---00000'      ; upper left clip
;        db      '--x0---0---00000'      ; upper right clip
Using _printGrid and these "test cases", I developed _solveGrid to count the number of mines each field is adjacent to.
_solveGrid:
        push    esi
        push    edi

.loop_all_fields_backwards:             ; for (edi = last_field_in_grid; edi >= 0; edi--)
        mov     edi, last_field_in_grid

.loopFields:
        cmp     [grid + edi], byte is_bomb ; is it a bomb? (or incremented bomb field)
        jb      .nextField              ; branch if less than bomb

        ; it is bomb, increment all neighbouring hints
.loop_all_neighbours_backwards:         ; for (ecx = 7; ecx >= 0; ecx--)
        mov     ecx, 7                  ; 8 neighbour coordinates to test

.loopNeighbours:
        mov     esi, edi                ; add to current bomb's position...
        add     esi, dword [around + ecx * register_size]
        cmp     esi, last_field_in_grid ; is the neighbour position valid?
        ja      .nextNeighbour          ; neighbour position is outside

        add     [grid + esi], byte 1    ; else add bomb count for neighbour

.nextNeighbour:
        sub     ecx, 1                  ; ecx--
        jnc     .loopNeighbours         ; ecx >= 0

.nextField:
        sub     edi, 1                  ; edi--
        jnc     .loopFields             ; edi >= 0

        pop     edi
        pop     esi
        ret

is_bomb equ     'x'

around: dd      - max_dimension - 1
        dd      - max_dimension
        dd      - max_dimension + 1
        dd      - 1
        dd        1
        dd        max_dimension - 1
        dd        max_dimension
        dd        max_dimension + 1
_solveGrid iterates all fields in grid using edi, even the ones outside the mine field. If the field is a bomb (contains an 'x'), The code loops ecx for all eight neighbours and increments their field indexed by esi. This adds one to byte value that is there, even if it is a bomb. That works as long as the bomb symbol has an ASCII value of at least 9 higher than the symbol of the empty space. In the edges of the field not all neighbours exist, so esi is clipped by last_field_in_grid which prevents array overflow. Interestingly there is no need to test for underflow, i.e. negative indices in esi, because these are also skipped by branch instruction ja. (The actual bit value of negative values is above last_field_in_grid.)

Preparing Output
After counting the adjacent bombs, grid contains the needed information but is not ready to be printed. For example hints need to be numbers.
_formatGrid:
        push    esi
        push    edi

.loop_all_fields_backwards:             ; for (edi = last_field_in_grid; edi >= 0; edi--)
        mov     edi, last_field_in_grid

.loopFields:
        cmp     [grid + edi], byte is_bomb ; is it a bomb?
        jnb      .formatBomb            ; bombs are increased as well, so we check for >=

.formatHint:                            ; it is a hint, need to make it a number
        add     [grid + edi], byte emptyToHint
        jmp     .nextField

.formatBomb:                           ; it is a bomb, reset symbol
        mov     [grid + edi], byte bomb

.nextField:
        sub     edi, 1                  ; edi--
        jnc     .loopFields             ; edi >= 0

        pop     edi
        pop     esi
        ret

is_empty equ    '-'
emptyToHint equ '0' - is_empty
bomb    equ     'B'
This resets bombs to their symbol and converts hints into decimal numbers. emptyToHint is the difference between the input character of the original mine field and the wanted hint digits. For example, if a field was incremented three times it contains '-' + 3 + '0' - '-' = '0' + 3 = '3' now, which is exactly what is need.

Reading Input
Finally I reached the fourth part of the implementation, reading the mine field from Standard Input. Reading the actual grid was similar to writing it, just the opposite direction of data flow, e.g. using _ReadFile@20 system call instead of _WriteFile@20 and skipping the new-line instead of writing it. More interesting was reading the first line containing the dimensions of the grid because these contain decimal numbers of arbitrary length.
_readDigit:
        ; parameters
fileReadHandle equ 4 + 1 * register_size
        ; local variables
numberOfBytesRead equ -1 * register_size
byteBuffer        equ -2 * register_size
number            equ -3 * register_size

        enter_method 3

        mov     [ebp + number], dword 0
.read_next_character:
        lea     edx, [ebp + numberOfBytesRead] ; lpNumberOfBytesRead
        lea     eax, [ebp + byteBuffer] ; lpBuffer
        mov     ebx, [ebp + fileReadHandle] ; hstdIn

        ; read a single character
        push    NULL
        push    edx
        push    dword 1
        push    eax
        push    ebx
        call    _ReadFile@20

        mov     eax, [ebp + byteBuffer]
        and     eax, 0xff
        cmp     eax, '0'
        jb      .finished
        sub     eax, '0'

        ; multiply by 10 and add new number
        mov     ebx, [ebp + number]
        shl     ebx, 1                  ; * 2
        mov     ecx, ebx
        shl     ecx, 2                  ; * 8
        add     ebx, ecx                ; = * 10

        add     ebx, eax
        mov     [ebp + number], ebx
        jmp     .read_next_character

.finished:
        mov     eax, [ebp + number]

        exit_method
        ret
_readDigit reads bytes from its input until it finds something that is white-space. In fact everything that has an ASCII value less than '0' is considered white-space. The currently read value is converted to a number by subtracting the ASCII value of '0' and added as next digit. Of course this is extremely simplified, as extra white-space, Windows new-lines or unexpected characters completely mess up the logic. Fortunately this is just a kata and no production code ;-)

Completed
The reading of the input completed the Minesweeper Kata in Assembly. To avoid repetition I separated the Standard In/Out handles from their actual usage and provided them as arguments to the functions. The final main entry point looked like that:
global  _main

_main:

        call    _getStdInFileHandle
        push    eax                     ; hstdOut 3 times for next calls
        push    eax

        ; read width
        push    eax
        call    _readDigit
        add     esp, register_size      ; clean up parameter
        mov     [width], eax

        ; read height
        ; 2nd eax still pushed
        call    _readDigit
        add     esp, register_size
        mov     [height], eax

        ; read grid
        ; 3rd eax still pushed
        call    _readGrid
        add     esp, register_size

        call    _solveGrid
        call    _formatGrid

        call    _getStdOutFileHandle

        push    eax                     ; hstdOut
        call    _printGrid
        add     esp, register_size

        jmp     _exit
If you want to see the whole kata, the complete source in here.

Done?
The kata is finished, but there are several issues remaining. Its design was influenced by my need for feedback through the console output. I am wondering how an implementation would look like if I had followed a strict TDD approach. I guess it would be less coupled to system calls at least. Also I am unhappy with its internal coupling. All four major functions depend on the grid and its internal structure, e.g. they need to know max_dimension. Following D.L. Parnas' criteria to decompose systems, it would be favourable to decouple the data from the different algorithms. The grid could be wrapped in its own module (OBJ) allowing only API access, which would add indirection, a lot of subroutine calls and the need to copy data around. This feels against the spirit of raw Assembly. It seems I will have to revisit Minesweeper again.

3 July 2015

Hello World Windows 32 Assembly

During my time at university I wrote a lot of Assembly for the 80x86 but stopped doing so when I moved into commercial software development. Recently - just 20 years later ;-) - I felt an urge to go back and refresh my memories. This was not the first time I felt retro, but this time I did not know hot to get started. I did not know anything about current tooling and had never created a native (Windows) application before. Usually the code I write today, be it in Java, C#, Ruby or otherwise, relies on underlying interpreters or virtual machines. (Yes I know, some people still write C and C++...)

Getting Started
Just by coincidence, one of the user groups in Vienna, the We Love Programming Languages group, dedicated one of its meetups to Assembly. There Angeliki Chrysochou showed a Hello World in Assembly which gave me a good idea what I had to do. During the following research I discovered how to write Hello World in Assembly under Windows on StackOverflow, which gave me all the information I needed to get started, or so I thought. In the end it still took me several hours to create a Hello World that worked, which is why I decided to write this summary.

Killbot Assembly LineLanguage
The modern Intel IA-32 flavour of 80x86 Assembly looks pretty much like the one I worked with in the 90's. I had no problem with that. Of course CPU architectures got more complex and you have to watch out for far more things when optimising for performance, e.g. branch prediction, pipeline stalls and so forth, but the general idea stayed the same.

The Assembler
Angeliki used NASM, the Netwide Assembler in her presentation and it looked good, so I tried it as well. It worked great and I did not check out other tools (like MASM). The command to translate (does Assembly get compiled?) an ASM file for Windows with NASM is
nasm -fwin32 helloworld.asm
This creates an OBJ file which needs to be linked.

Operation System Calls
As soon as you want to do anything regarding input, output or even exiting the current application, you need operation system calls. You have to search the MSDN Windows API Index for the calls you need and have to decorate them according to the Windows Application Binary Interface (ABI). (You can also use C functions, but that would not be pure Assembly, wouldn't it.) The decorated function name has to be external and its parameters are put on the stack in CDECL order. (In fact Windows 32 system calls are syscall.) The source of the shortest NASM Windows application is
global _main
    extern  _ExitProcess@4

    section .text
_main:
    push    0
    call    _ExitProcess@4
It took me time to figure out why ExitProcess is _ExitProcess@4 but MessageBox is _MessageBoxA@16.

The Linker
Finally the generated OBJ code has to be combined with the used system libraries into a single unified executable program. I found several ways for doing this under Windows. The native Windows way is to use link.exe from Microsoft Visual C++, e.g.
link /entry:main helloworld.obj /subsystem:console /nodefaultlib kernel32.lib
Linker, August GusThe kernel32.lib is necessary for the ExitProcess to be found. Getting the linker was a bit of a hassle because you need to install Visual C++ or at least some Windows SDK to get it. I did not want to do that, but I remembered that I had used the command line part or VC 6.0 aka VC98 to compile native extensions for Ruby 1.8.6 before the Ruby DevKit became popular. After recovering VC98.zip from the depths of my hard-disc, I enabled it by calling its vcvars32.bat. This sets the PATH, INCLUDE and LIB environment variables. For example the kernel32.lib is located in the library folder. As far as I know (from StackOverflow), vcvars32.bat is still available in VC++. I guess I would need a newer Kernel library to use the latest Windows 32 functions, but this was enough for Hello World and I was able to link my OBJ.

Another way to link under Windows is to use Unix tool ports like MinGW, a minimalist development environment for native Microsoft Windows applications. MinGW 32 comes with ld.exe, e.g.
ld -o helloworld.exe helloworld.obj -e_main -lkernel32
      --enable-stdcall-fixup %SYSTEMROOT%\system32\kernel32.dll
Again I did not want to download and install huge tools, I just wanted to create a little Assembly Hello World. This should not be so hard. As I said before, MinGW is used to compile native C extensions for Ruby since 1.8.7, and I just used the one that came with the Ruby DevKit. After setting its devkitvars.bat I was able to create my helloworld.exe.

There exist standalone linkers like GoLink which might work, but I did not check any of them after I had success with both link and ld. But in general I would prefer something small, like NASM which is just a single executable. (Edit: Yes Golink does work and is just a single executable, just what I need.)

Execution
One beauty of Assembly is the size of created application. I remember one of my smallest DOS applications was around 20 bytes (!) in total, a COM application which did not require any linking. Agreed it did not do much, just turned off Num-Lock, but it was useful to me at that time. The Hello World's object file is 445 bytes and the executable helloworld.exe created by ld is around 4kB, the one created by link 12 kB. (I did not check for any settings to optimise for size, remove debug information etc. - anyway the size of the compiled program does not matter.)

Code
Here is the complete Assembly source code of the pure Windows IA-32 Hello World, pretty much like it was answered by caffiend,
STD_OUTPUT_HANDLE equ -11
%define NULL    dword 0

    extern  _GetStdHandle@4
    extern  _WriteFile@20
    extern  _ExitProcess@4

    global _main

    section .text

_main:
    ; local variable
bytesWritten equ -4
    mov     ebp, esp
    sub     esp, 4

    ; hStdOut = GetstdHandle(STD_OUTPUT_HANDLE)
    push    STD_OUTPUT_HANDLE
    call    _GetStdHandle@4
    mov     ebx, eax

    ; WriteFile(hstdOut, message, length(message),
    ;           &bytesWritten, null);
    push    NULL
    lea     eax, [ebp + bytesWritten]
    push    eax
    push    (message_end - message)
    push    message
    push    ebx
    call    _WriteFile@20

    ; ExitProcess(0)
    push    0
    call    _ExitProcess@4

    ; never here
    hlt
message:
    db      'Hello, World', 10
message_end: