3 July 2015

Hello World Windows 32 Assembly

During my time at university I wrote a lot of Assembly for the 80x86 but stopped doing so when I moved into commercial software development. Recently - just 20 years later ;-) - I felt an urge to go back and refresh my memories. This was not the first time I felt retro, but this time I did not know hot to get started. I did not know anything about current tooling and had never created a native (Windows) application before. Usually the code I write today, be it in Java, C#, Ruby or otherwise, relies on underlying interpreters or virtual machines. (Yes I know, some people still write C and C++...)

Getting Started
Just by coincidence, one of the user groups in Vienna, the We Love Programming Languages group, dedicated one of its meetups to Assembly. There Angeliki Chrysochou showed a Hello World in Assembly which gave me a good idea what I had to do. During the following research I discovered how to write Hello World in Assembly under Windows on StackOverflow, which gave me all the information I needed to get started, or so I thought. In the end it still took me several hours to create a Hello World that worked, which is why I decided to write this summary.

Killbot Assembly LineLanguage
The modern Intel IA-32 flavour of 80x86 Assembly looks pretty much like the one I worked with in the 90's. I had no problem with that. Of course CPU architectures got more complex and you have to watch out for far more things when optimising for performance, e.g. branch prediction, pipeline stalls and so forth, but the general idea stayed the same.

The Assembler
Angeliki used NASM, the Netwide Assembler in her presentation and it looked good, so I tried it as well. It worked great and I did not check out other tools (like MASM). The command to translate (does Assembly get compiled?) an ASM file for Windows with NASM is
nasm -fwin32 helloworld.asm
This creates an OBJ file which needs to be linked.

Operation System Calls
As soon as you want to do anything regarding input, output or even exiting the current application, you need operation system calls. You have to search the MSDN Windows API Index for the calls you need and have to decorate them according to the Windows Application Binary Interface (ABI). (You can also use C functions, but that would not be pure Assembly, wouldn't it.) The decorated function name has to be external and its parameters are put on the stack in CDECL order. (In fact Windows 32 system calls are syscall.) The source of the shortest NASM Windows application is
global _main
    extern  _ExitProcess@4

    section .text
_main:
    push    0
    call    _ExitProcess@4
It took me time to figure out why ExitProcess is _ExitProcess@4 but MessageBox is _MessageBoxA@16.

The Linker
Finally the generated OBJ code has to be combined with the used system libraries into a single unified executable program. I found several ways for doing this under Windows. The native Windows way is to use link.exe from Microsoft Visual C++, e.g.
link /entry:main helloworld.obj /subsystem:console /nodefaultlib kernel32.lib
Linker, August GusThe kernel32.lib is necessary for the ExitProcess to be found. Getting the linker was a bit of a hassle because you need to install Visual C++ or at least some Windows SDK to get it. I did not want to do that, but I remembered that I had used the command line part or VC 6.0 aka VC98 to compile native extensions for Ruby 1.8.6 before the Ruby DevKit became popular. After recovering VC98.zip from the depths of my hard-disc, I enabled it by calling its vcvars32.bat. This sets the PATH, INCLUDE and LIB environment variables. For example the kernel32.lib is located in the library folder. As far as I know (from StackOverflow), vcvars32.bat is still available in VC++. I guess I would need a newer Kernel library to use the latest Windows 32 functions, but this was enough for Hello World and I was able to link my OBJ.

Another way to link under Windows is to use Unix tool ports like MinGW, a minimalist development environment for native Microsoft Windows applications. MinGW 32 comes with ld.exe, e.g.
ld -o helloworld.exe helloworld.obj -e_main -lkernel32
      --enable-stdcall-fixup %SYSTEMROOT%\system32\kernel32.dll
Again I did not want to download and install huge tools, I just wanted to create a little Assembly Hello World. This should not be so hard. As I said before, MinGW is used to compile native C extensions for Ruby since 1.8.7, and I just used the one that came with the Ruby DevKit. After setting its devkitvars.bat I was able to create my helloworld.exe.

There exist standalone linkers like GoLink which might work, but I did not check any of them after I had success with both link and ld. But in general I would prefer something small, like NASM which is just a single executable. (Edit: Yes Golink does work and is just a single executable, just what I need.)

Execution
One beauty of Assembly is the size of created application. I remember one of my smallest DOS applications was around 20 bytes (!) in total, a COM application which did not require any linking. Agreed it did not do much, just turned off Num-Lock, but it was useful to me at that time. The Hello World's object file is 445 bytes and the executable helloworld.exe created by ld is around 4kB, the one created by link 12 kB. (I did not check for any settings to optimise for size, remove debug information etc. - anyway the size of the compiled program does not matter.)

Code
Here is the complete Assembly source code of the pure Windows IA-32 Hello World, pretty much like it was answered by caffiend,
STD_OUTPUT_HANDLE equ -11
%define NULL    dword 0

    extern  _GetStdHandle@4
    extern  _WriteFile@20
    extern  _ExitProcess@4

    global _main

    section .text

_main:
    ; local variable
bytesWritten equ -4
    mov     ebp, esp
    sub     esp, 4

    ; hStdOut = GetstdHandle(STD_OUTPUT_HANDLE)
    push    STD_OUTPUT_HANDLE
    call    _GetStdHandle@4
    mov     ebx, eax

    ; WriteFile(hstdOut, message, length(message),
    ;           &bytesWritten, null);
    push    NULL
    lea     eax, [ebp + bytesWritten]
    push    eax
    push    (message_end - message)
    push    message
    push    ebx
    call    _WriteFile@20

    ; ExitProcess(0)
    push    0
    call    _ExitProcess@4

    ; never here
    hlt
message:
    db      'Hello, World', 10
message_end:

5 June 2015

Choose Your Development Services Wisely

URIs Should Not Change
Modern software development relies a great deal on the web. Our source code is hosted on GitHub, we download necessary libraries from Maven Central, Ruby Gems or NPM. We communicate and discuss using issue trackers and forums and our software is built on hosted Continuous Integration servers. We rely a lot on SaaS infrastructure. But the Internet is constantly in motion. Pages appear and vanish. New services get created, moved around and shut down again. When Tim Berners-Lee created the web, he wished for URIs that would not change, but unfortunately the reality is different.

Dead EndI hate dead links. While usually dead links are not a big deal, you just use Internet search to find their new homes, I still find them annoying. It is impossible to update all dead links on my blog and in my notes, but at least I want the cross references of my own stuff to be correct. This means that I have to migrate and update something at least once a year. The recent shutdown of Google Code concerned me and I had a lot of extra work. This made me think.

Learning: Host as much of your content under your own control, e.g. your personal web space.

To deny the usage of today's powerful and often free services like GitHub would be foolish, but still I believe we (developers) should consider them dependencies and as dependencies they are also liabilities. We should try to reduce coupling and be aware of potential migration costs.

Learning: When choosing a SaaS service, think about its benefits versus the impact of it not being available any more.
Learning: When you pay for it, it is more stable.

Personal Development Process
The impact can be internal, which means it affects only you and your work, or it can be external, which means it affects others, your users or people using your code or documentation. For example let us consider CloudBees. I started using it in the early beta and it was great. Given time I moved all my private, kata and open source projects there and had them build on each commit. It was awesome. But last year they removed their free plan and I did not want to pay, so I stopped using it. (This is no criticism. CloudBees is a company and needs to make money and reduce cost.) The external impact was zero as the Jenkins instance was private. The internal impact seemed huge. I had lost my CI. I looked for alternatives like Travis, but was too lazy to configure it for all my projects. Then I used the Jenkins Job Import Plugin to copy my jobs to a local instance and got it sort of running. (I had to patch the plugin, wasting hours...) Still I needed to touch every project configuration and in the end I abandoned CI. In reality the private impact was also low as I am not actively developing any Open Source right now and I am not working on real commercial software where CI is a must. Now I just run my local builds more often. It was cool to use CloudBees, but I can live without it.

Learning: Feel free to use third party SaaS for convenience, i.e. for anything that you can live without easily.

Service Station 07Information Sharing
Another example is about written information, the Hackergarten wiki. When I started Hackergarten Vienna in 2011, I collected material how to run it and put it into the stub wiki someone had created for Hackergarten. I did not think about it and just used the existing wiki at Wikispaces. It seemed the right place. There were a few changes by other users, but not many changes at all. Two years later Wikispaces removed their free plan. Do you see a pattern? The internal impact was zero, the but external impact was high, as I wanted to keep the information about running a Hackergarten available to other hackers. Still I did not want to spend 50$ to keep my three pages alive. Fortunately Wikispaces offered a raw download of your wiki pages. I used this accessible copy of my work and converted the wiki pages into blog pages in no time. As I changed the pages rarely the extra overhead of working in HTML versus Creole was acceptable. Of course I had to update several links, a few blog posts and two slide-decks, sigh. (And it increased my dependency to Google Blogger.)

Learning: When choosing a SaaS service, check its ways of migrating. Avoid lock-in.
Learning: Use static pages for data that rarely changes.

Code Repositories
Moving code repositories is always a pain. My JavaClass Ruby Gem started out on Rubyforge, later I moved it to Google Code. With Google Code shutting down I had to migrate it again, together with seven other projects. The raw code and history were no problem, hg convert dealt with that. But there were a lot of small things to take care of. For example, different version control system used a different ignore syntax. The Google Code project description was proprietary and needed to be copied manually. The same was true for wiki pages, issues and downloads.

I had to change many incoming links. First URL to change was the source repository location in all migrated projects' descriptors, e.g. Maven's pom.xml, Ruby's gemspec, Node's package.json and so on. Next were the links to and from project wiki pages and finally I updated many blog posts and several slide-decks. And all the project documentation, e.g. Maven sites or RDoc API pages needed to be re-generated to reflect the new locations. While this would be no big deal for a single project, it was a lot of work for all of them. I full-text-searched my hard-disc for obsolete URLs and kept finding them again and again.

Maybe I should not cross link my stuff that much, and I am not even sure I do link that much at all. But instead of putting the GitHub URL of the code kata we will be working on in a Coding Dojo directly into the slides, I could just write down the URL on a flip-chart at the beginning of the dojo. The information about the kata seems to be more stable than the location of the source code. Also I might use the same slides working on code in different languages, which might be stored in different repositories. But on the other hand, if I bother to write documentation and I reference something related, I expect it to be linked for fast navigation. That is the essence of hyper-text, isn't it?

Learning: (maybe) Do not cross link too much.
Learning: (maybe) Do not link from stable resources to less stable ones.

Moving Day!Generated Artefacts
Next to source code I had to migrate generated artefacts like my Maven repository. I had used a Google Code feature that a repository was accessible in raw mode. I would just push to my Maven repository repository (recursion yeah ;-) and the newly released artefacts would show up. That was very convenient. Unfortunately Bitbucket could not do that. I had a look into Bitbucket pages, but really did not feel like changing the layout of the repository. I was getting tired of all this. In the end I just uploaded the whole thing to my public web space. Static web pages and binary files, e.g. compressed archives, can be hosted on any web server and I should have put them there in the very beginning. Again I had to update site locations, repository URLs and incoming links in several projects and blog posts. As I updated my parent Pom I had to release new versions of several projects. I started to hate hyper-links.

Learning: Host static data on regular (personal) web spaces.

You might argue that Maven Central would be a better place for Maven artefacts and I totally agree. I consider Maven Central much more stable than my personal web space, but I did not bother to go through the process of getting access to a service that would mirror my releases to Central. Anyway, this mirroring service, like Sonatype's, feels less stable than Central itself.

Learning: Host your stuff on the most stable option available.

Now all my repositories are hosted on Bitbucket. If its services stop working some day, and they surely will stop somewhere in the future, I will stop using hosted repositories for my projects. I will not migrate everything again. I am done.

Learning: (maybe) Do not bother with dead links or losing your stuff. Who cares?

CNAMEs
For some time Bitbucket offered a CNAME feature that allowed you to associate a domain or sub-domain with an account. That was really nice, instead of bitbucket.org/pkofler/project-x I used hg.code-cop.org/project-x. I liked it, or so I thought. Of course Bitbucket decided to disable this feature this July and I ended - again - updating URLs everywhere. While the change was minimal, path and parameters of URLs stayed the same, I had to touch almost every source repository to change its public repository location, fix 20+ blog posts and update several Coding Dojo slide-decks, which in turn needed to be uploaded to SlideShare again.

Learning: Only use the minimal, strictly necessary features of a SaaS.
Learning: Even when offered, turn off all features you do not need, e.g. wiki, issues, etc.
Learning: Do not use anything just because it is handy or "cool". It increases your coupling.

change machineURLs Change All The Time
Maybe I should not link too much in this post as eventually I will have to change all the links again, grrrr. I consider using a URL service like bitly. Maybe not exactly like bitly because its purpose is the shortening and marketing aspect of links and I do not see a way to change the actual link once it was created. And I would depend on another service, which eventually would go away. So I need to host the service myself, like J. B. Rainsberger does. I like the idea of updating all my links with a single update statement in the link database. I wished I had used such a thing. It would increase work when creating links, but would reduce work when changing them. Like with code, it seems that my links are far more often changed than created, at least some of them.

I could not find any free or otherwise service providing this functionality. So I would have to create my own. Also I do not know the impact of the extra redirect on search engines and other automatic consumers of my content. And still some manual changes are inevitable. If the repository location moves, the SCM settings need to be changed. So I will just wait until the next feature or whole service is discontinued and I have to start over.

Thanks to Thomas Sundberg for proof-reading this article.

29 May 2015

How to Organise Your Code Katas

If you read my blog you know that I like code katas. I did my first one back in 2004 after reading Kent Beck's Test Driven Development book. I was learning Ruby and Kent had recommended creating an xUnit implementation as an exercise to get to know a new language. I did not know it was a kata, I just developed my personal RUnit following the TDD principles.

Over the years I did similar exercises and started to perform formal katas somewhere in 2009/2010. I used them for personal practise and for demo, e.g. to show students how TDD could look like as part of my QA guest lecture. At my first Code Retreat I noticed that pairing on a code kata gave me more insights, so I looked for people who would spend some time with me practising, usually remotely.

Tip: Keep the source of your katas (or at least a record)
I do keep the code of my katas. Even at Code Retreats where you are supposed to delete your code, I manage to recover the source from local history or version control. When I use online tools like Cyber-Dojo, I write down the session id and come back later to recover the code. (I know I am a bad person. ;-)

At the time of writing, I have worked on more than 230 code katas resulting in almost 300 sessions of personal or paired practise. (In remote katas we sometimes tackle larger problems and then work more than one session on it.) I have a lot of kata sources and they are all over my hard-disc. I managed to collect some of them in dedicated repositories, one for each programming language, but many are in their own projects, or even mixed up with other code in early "learning" repositories. It is a mess. (Probably it does not matter, but it is a mess nevertheless.)

Shells, organized neatlyTip: Collect all your katas in one place
In her article about code katas, Iris Classon gave some practical advice regarding katas, e.g. to collect them all in one place, to be able to compare solutions. I really liked the idea and recently I found some time to collect my katas in one place - or so I thought.

Tip: Name your code katas consistently, even across languages
I faced some problems. First not all my katas followed the same naming conventions, e.g. prime_factors and primefactors were good enough for me, but not for a unified collection. Most of my katas were in version control and I did not want to rename or move large numbers of files, because I would lose history. (Again history did not matter for katas, especially as I never looked at it - but I was not able to drop practises I followed every day.)

Tip: Put your katas in kata-only repositories
I managed to extract katas out of mixed repositories using hg convert --filemap with a filemap of simple inclusions, preserving the whole history. I also fixed some inconsistent kata folders with a filemap of renames. But hg convert created new, unrelated repositories and I had to drop and recreate some of my Bitbucket repos.

My vision was to place the same katas regardless of language near one another, so I would be able to count and compare them. But I did not find a way to do that with all the different languages and keeping the code working. How would I combine Java, NodeJS, Ruby and Scala sources in a consistent way, other than separating them by source folder, which they already were. I failed to merge all my katas and looked for alternatives.

In the end I created a little script that would search my hard-disc for katas, normalise their names and collect them in a single place. I grouped the sources first by kata name and then by date. The programming language did not seem that important for a combined collection and ended up after the date, resulting in the name pattern [name of kata]/[ISO date] [- optional comment] ([programming language]). The whole collection looked like:
All Katas
|-- bankocr
|-- bowlinggame
    |-- 20120924 (Java)
        `-- BowlingTest.java
    |-- 20130307 (Scala)
        `-- BowlingGameSuite.scala
    |-- 20130414 (JavaScript)
        `-- BowlingGameSpec.js
    `-- ...
|-- fizzbuzz
`-- ...
Tip: Compare your katas
In the beginning I believed that the code of the kata, the final product, did not matter, that only the process of getting there, the practise, was important. That was the reason why people published kata-casts, because the final code did not represent much. Later I discovered that looking at code of a problem that I knew well, and reading comments regarding that code, also had value. So I searched the web for source code and recordings of the katas I had done. I planned to look at them and to compare them with my solutions and to learn from them even more. Of course I never had the time to do this.

But now I can see all exercises I ever did at one glance. Now I have the opportunity to compare my solutions across time and languages. This is the perfect time to include my bookmarks of katas as well. This is going to be interesting because I never had a second look at my katas before.

(See all my posts about code katas and my public kata repositories.)