Code Cop: 2014

30 November 2014

TDDaiymi vs. Naked Primitives

For our recent session Brutal Coding Constraints at the Agile Testing Days 2014, Martin Klose and I experimented with combinations of constraints for code katas and coding dojos. A constraint, also known as an activity, is a challenge during a kata, coding dojo or code retreat designed to help participants think about writing code differently than they would otherwise. Every activity has a specific learning goal in mind. Most coding constraints are an exaggeration of a fundamental rule of clean code or object oriented design. We tried to find particularly difficult combinations that would challenge expert developers and masters of object orientation. A brutal mixture of constraints is TDD as if you Meant it, No IFs and No naked primitives.

TDD as if you Meant it
Several years ago Keith Braithwaite noticed that many developers use some kind of pseudo-TDD by thinking of a solution and creating classes and functions that they just know they need to implement it. Then their tests keep failing and much time is spent debugging. But instead they should just follow Kent Beck's original procedure of adding a little test, seeing it fail and making it pass by some small change. To help developers practise these original principles Keith defined TDDaiymi as an exercise to really do TDD. The exercise uses a very strict interpretation of the practice of TDD, avoiding design up front and helping people getting used to emergent design by refactoring. It is a difficult exercise and I still remember the first time I did it in a session run by Keith himself. For almost two hours my pair and I were struggling to create something useful. While TDDaiymi is a bit extreme, some people believe it to be the next step after TDD. German "Think Tank" Ralf Westphal even claims that following TDD as if you Meant it is the only way that TDD can give you good design results.

No naked primitives
No naked primitives is another nice constraint. All primitive values, e.g. booleans, numbers or strings need to be wrapped and must not be visible at object boundaries. Arrays, all kinds of containers like lists or hash-tables and even Object (the root class of the language's class hierarchy) are considered primitive as well. Similar to Keith's TDDaiymi this rule is designed to exercise our object orientation skills. A string representing a name is an under-engineered design because many strings are no valid names. In an object oriented system we would like to represent the concept of a name with a Name class. Usually Value Objects are used for this purpose. Also a list of shopping items is not a shopping basket. A general purposes list implementation offers operations that do not make sense for a shopping basket. So containers need to be encapsulated. While it is exaggerated to wrap all primitives (see Primitive obsession obsession), I have seen too many cases of Primitive Obsessions that I rather see a few additional value objects than another map holding maps holding strings. (To avoid that I created a custom PMD rule PrimitiveObsession that flags all primitives in public signatures of Java classes.)

No IFs
No IFs is a great constraint of the Missing Feature catalogue. All conditionals like if, while, switch and the ternary operator (e.g. _ ? _ : _) are forbidden. Like No naked primitives this is exaggerated but many code bases are full of abysmal-deeply nesting, like the PHP Streetfighter. This has been recognised as a problem and there are campaigns to reduce the usage of ifs. After all, excessive usage of if and switch statements is a code smell and can be replaced by polymorphism and other structures. In order to practice the constraint is taken to the extreme. I have seen many developers struggle with the missing concept in the beginning, but after some thought each and everyone of them found a way to express his or her concepts in another, less nested way.

Combining Constraints
Some constraints like TDDaiymi are already difficult on their own, but obeying several of them at the same time is hard. We stick to our (coding-) habits even if we planned to do otherwise. While working the experiments Martin an I would notice a constraint violation only after several times switching roles, several minutes after introducing the "forbidden" concept. This is especially true for conditionals because they are most fundamental to our (imperative styled) coding. It seems that tool support for constraints would help, but not all constraints can be enforced automatically and definitely not for all programming languages.

Refactoring Towards Constraints
It is difficult to find solutions with constraints - after all that is one of their primary purposes - so we decided to first make it green and then refactor towards the constraint. This seems natural as TDD itself has a similar rule: After creating the next failing test (red phase), try to get it green as fast as possible, creating dirty code and duplication as needed (green phase). Then clean up the mess relying on the safety net of the already existing tests (refactor phase). This results in more refactoring than usual, and we spent more than half of the time refactoring. As TDDaiymi puts much more focus on the refactoring step, this might have been due to the first constraint alone. 50-50 for a code kata with a refactoring focused constraints is not much, given that some people report three refactoring commits per one feature commit in their production code.

Designing Up Front?
When refactoring towards the constraint, it is often hard to change the current ("green") solution because it is rooted on primitive data types or conditionals or whatever violates the constraints. Sometimes we had no clue how to refactor and left the violation in the code for several red-green-refactor cycles until it became clear how to proceed. Being hard is not a problem because this is an exercise after all but allowing violations for extended periods of time made me unhappy as the code was not completely clean regarding our standards after each cycle. Also the code looked weird but some constraints are weird and therefore force odd solutions. In following TDDaiymi we did not think about any design up front, but maybe we should have done so. Thinking about structures (i.e. design) that support the constraint, e.g. the absence of conditionals, greatly helps the code and such "designed" solutions look much better. While we should think about the problem, the technology-agnostic solution and its acceptance criteria (test cases) before starting to write code, classic TDD discourages creating a design up front, and TDDaiymi does even more so. I guess more practise is needed to resolve this issue.

Evolving Structures
Another, more direct contradiction is the combination of TDDaiymi and No naked primitives. While following the former constraint, we want to avoid premature design decisions and sometimes use Object as a place-holder for a concept as long as there is no structure yet. Maybe this can be avoided but I do not know how. (If you know how to TDDaiymi without plain objects, I am more than happy to pair with you on some exercises.) In the end, when the design is finished, the Object has been replaced, but as the final structure evolves late, it may stick around for some time. On the other hand Object is a primitive and therefore forbidden. Maybe it is special and should not count as primitive. I have never seen any under-engineered code base using plain objects, on the contrary they usually contain lots of strings with the occasional boolean or number, but no naked Object. So I believe it is safe to allow Object as primitive because it will be replaced eventually.

Conclusion
When I started to write this article I thought its conclusion would be that some coding constraints are just not compatible. As constraints are exaggerated principles, this would be no surprise. But now, after ~~writing~~ thinking about it, I am not sure it is a problem. On the contrary the discussion helped me understand the constraints, their origins and intent. My real conclusion is that doing katas with constraints sparks interesting discussions and is a great way of learning about one's own coding habits and many aspects of coding in general. I strongly encourage you to do so,

Credits
Thanks to Martin Klose for the countless hours of pairing and discussing these constraints with me.

2 October 2014

Visualising Architecture: Maven Dependency Graph

Last year, during my CodeCopTour, I pair programmed with Alexander Dinauer. During a break he mentioned that he just created a Maven Module Dependency Graph Creator. How awesome is that. The Graph Creator is a Gradle project that allows you to visually analyse the dependencies of a multi-module Maven project. First it sounds a bit weird - a Gradle project to analyse Maven modules. It generates a graph in Graphviz DOT language and an html report using Viz.js. You can use the dot tool which comes with Graphviz to transform the resulting graph file into JPG, SVG etc.

In a regular project there is not much use for the graph creator tool, but in super large projects there definitely is. I like to use it when I run code reviews for my clients, which involve codebases around 1 million lines of code or more. Such projects can contain up to two hundred Maven modules. Maven prohibits cyclic dependencies, still projects of such size often show crazy internal structure. Here is a (down scaled) example from one of my clients:

Maven dependencies in large project

Looks pretty wild, doesn't it? The graph is the first step in my analysis. I need to load it into a graph editor and cluster modules into their architectural components. Then outliers and architectural anomalies are more explicit.

28 September 2014

Advanced Personal Branding

This February I wrote three articles on personal branding for software developers. I discussed creating and strengthening your brand step by step: branding all your accounts and defining your motto, sharing and promoting yourself and maintaining a technical blog. I sorted these activities by difficulty into a kind of personal branding ladder, which will vary for different people depending on their personality. This is the final article covering advanced, that are more difficult and more time consuming branding activities.

Go Out!

Personal branding is - well - personal. So you need to meet people in person, interact with them. Find your local user groups or technology related meetups and attend regularly. Join the discussion and talk about the topics you are interested in. You do not need to present anything formally. Regular listeners who ask questions now and then are vital for the existence of any community. There is no way you can fail here. As long as you are authentic, people who share your enthusiasm will want to meet you and discuss your topics. You are interesting to like-minded developers, you just need to allow them to find you.

Present at a User Group
After attending the user group meetings regularly, it is time to take the next step and present something yourself. It it true that some people would rather face death than talk in front of a crowd, but the usual audience at community meetings is forgiving. Remember, most people in the audience are like you and already know you in person. They gave similar beginner talks or know that they should. First time speakers cannot be expected to give flawless talks, and that is the beauty of user groups, full of natural human beings, delivering refreshing and idiosyncratic presentations. Some communities are so successful in encouraging their members to talk, that they continuously breed world class speakers.

Nevertheless I am not saying that your talk does not need rigorous preparation and practice up front. There are several basic things that you can screw up in presentations in general, like giving a wall of text or death by power point. Do your research and read some articles on preparing content, creating slides, presentation techniques and such. There is also much content available on technical presentations in particular. In fact there is only one rule you need to keep in mind: Your presentation is not about you. It is not about you becoming a rock star speaker, it is about serving the audience. For example if you want the audience to read your code samples, make them easy to understand and write them in a large enough font. If the font is too small you are actually conveying the message that you do not care if people read it.

I gave my first presentation at the local Java user group five years ago. It worked out well and today, many presentations later, I still like to talk to smaller groups because these presentations often become conversations and large crowds make me nervous. If you have an extrovert character and like talking to people, giving regular presentations could be less cumbersome than maintaining a technical blog. Here you might change the order of steps. Anyway you need to do both!

Organise a User Group
Organising a local community is hard work. Meetings have to be scheduled, speakers contacted and so on. Usually the group leaders' work is not seen but vital for a thriving community. Step up and help the organisers, your help will be appreciated. Or maybe there is no local community for your favourite topic, then it is high time to create one. Creating new communities is easy using tools like Meetup or social media platforms. For example Aaron Cruz is a "community factory". Be created a new meetup for a topic that clearly has been missing, Clojure Vienna and organised a few meetups, which were a great success. Then he transferred the ownership of the group to the most enthusiastic member and went on to create another meetup.

Your group does not have to be local, there are good examples of virtual communities as well. For example Jonny Andersson runs a remote reading group, a small group of people sharing the interest to learn from books. Another, quite different example is the vJUG, the on-line Java User Group, which brings well known speakers on-line every month.

As I said above, being a community leader can be a lot of work. For example Peter Brachwitz of Clojure Vienna told me that he prepares a presentation if nobody else volunteers to do so. Now that is great leader spirit! Despite the effort, running an user group is a highly rewarding activity. You will be able to watch great presentations (if you organise them ;-) and have "exclusive" access to speakers and other community leaders. For example in the Java world there is an International JUG Leader's conference once a year.

Present at a Conference
Presenting at user groups is often informal, sometimes becoming a discussion rather than a polished presentation. The larger the audience gets, the more formal and professional your presentation needs to be. When submitting a talk to a large and well known conferences like Devoxx, you are competing with many other speakers to get accepted. Also your future talk needs much more practice. When facing 80 or 100 people for the first time, who are all looking at you in eager anticipation, your brain is likely to shut down, unless you are naturally gifted. At least mine did, and I did not even talk to really large crowds till now.

So your presentation needs more preparations, several dry runs, maybe even showing it to a colleague for feedback. This is much work, which keeps me from doing it too often, if at all. And I do not believe in talks with little or no preparation. Even if you do not mind making a fool of yourself, you are doing your audience a disservice. You are wasting their precious (and limited) learning time, when instead they could listen to great talks in parallel tracks.

At international events you might meet new people and grow your network beyond your local area. While this is already true for all attendees, the "magic" speaker badge lets you stand out. Other speakers will talk to you and regular attendees will stalk you to ask questions ;-) And speaking at international conferences can make you famous, really world-famous if you work hard. Working hard means attending at least one conference each month, all around the globe, besides your regular work. This is really tough, as veteran speakers like Dan North or Thomas Sundberg have assured me.

Organise a Conference
Did you spot the pattern? Find some event, attend, contribute and finally organise one yourself. What is true for local events is even more true for conferences. Again you can start small with local one-day conferences embedded in larger communities like Eclipse DemoCamp, Google DevFest or Code Retreat. Your event is likely relying on the infrastructure and help of a well running user group, because you cannot do everything by yourself. For example when we started Eclipse DemoCamp in Vienna five years ago, we did so with the help of the Java User Group Austria.

A much better example is GeeCON, my favourite Java conference which I attend every year. I believe its story is the following: Some guys of the Polish JUG met and complained about the lack of a great conference in Poland. They decided to create one. Already the first version of GeeCON was a huge success and over the years the conference became one of most awesome events I have ever attended. But GeeCON is also a perfect example of the hard work needed to run such an event. If the organisers are not in a hurry, e.g. to buy more wireless routers or fix some other problem, they are walking around the venue slowly, with tired, dark circled eyes. Lukasz Stachowiak, member of the GeeCON organisers, once told me that preparations for the next version of GeeCON start on the very next day after the previous one has ended. I am sure this is also true for Devoxx and all community-driven conferences.

Write a Technical Book
Finally we reach the top of the food chain. Writing a book is probably the most time consuming activity. Tomek Kaczanowski told me that it took him two years to get his Practical Unit Testing book delivered. But the time was well spent. His book is getting more and more popular, which is nice in case of the little revenue each book gives, but much more important is the widespread reception of his book.

As I did not write any books myself, I can only refer to articles about doing so. For example Jack Shirazi's discussion if writing a technical book is worthwhile covers income vs. non-direct income vs. time spent of writing a book. It says "People are impressed by authors. If you have had a book published in a certain area, even if that book did not do particularly well, people are impressed." Since 2007 when he wrote the article, things have changed in publishing if you decide to self publish, as Tomek did. Jurgen Appelo recommends to read at least three books about self publishing before starting with it. And if you think about writing a book you should read Rand's excellent explanation how to write a book first.

It seems that writing a book is sort of making addicted. Many authors I know did write more than one book or at least think about writing another one, even if it would make their wives unhappy (again). Tomek decided to give his third book away for free. Bad Tests, Good Tests is a short book, nevertheless it took him some time to write it. Sure, giving the book away for free removes its direct income aspect, but increases its non-direct income, as more people will get it. Anyway, free or not, it is a great book and you should read it!

Thanks to all the people I used in this blog post as examples for successful branding activities, especially as I did not ask for their permission to do so.

18 September 2014

Teaching using Mob Programming

Since some time I am training young developers. I meet with different teams and we discuss topics like Pair Programming or I run hands-on workshops in the Coding Dojo format. Occasionally I pair program with single team members on difficult tasks as mentoring on steroids.

The first session
Last November one team I work with grew tired of Coding Dojo examples and wanted to work on real code while keeping the good atmosphere of our sessions. I envisioned sort-of a dojo in Randori style with everybody of the team pairing with me one after another. It ended with me being busy driving and all others navigating in different directions at the same time. Additionally, as teacher, I had to think about the proper action for the code and discuss the different proposals of navigators. It was difficult and showed many opportunities for improvement.

Enter Mob Programming

At the same time I learned about Mob Programming. Mob Programming is defined as all the brilliant people working at the same time, in the same space, at the same computer, on the same thing. That means that most of the mob's work is done sitting together in a group work area. All the necessary skills are present together in the same room, so there is no waiting time. If you have never heard of it, I recommend watching this presentation by Woody Zuill about Mob Programming, presented at the Oredev conference last year. Woody's team is using this technique since some time for all production work.

My Sessions
Inspired by Mob Programming I applied some of its concepts to my sessions. For example I never drive and I also do not navigate. Instead I try to meta-navigate the team of navigators. (Or sometimes I simply dominate everybody and tell the driver exactly which keyboard short-cuts to type. ;-) In the following sessions we refactored some code and created unit tests. All these sessions worked out well. Now, after running more than ten Mob Programming sessions with various teams during the last year, here are some things I noticed:

Communication
Mob Programming enables communication. In one of the first sessions, the architect of the team saw something in the source he did not like. He mentioned it to be an architecture violation but the other team members did not know this aspect of the architecture. It was amazing to see them talk. Although being on the same team they had not talked about it before. It is known that regular pair programming spreads the knowledge throughout the team and Mob Programming does even more so.

Participation
"Mob learning" differs from regular "mob work". Some people like learning sessions less than "the real thing" Also as in any team and any team activity some are more interested in the session than others. But usually participants are engaged, probably due the good communication. One major factor for active participation is the team size. Sessions with too many people do not work out well. Larger groups split into subgroups with their own agendas. I recommend group sizes of ten or less people.

The teams I work with are mixed teams with members of different experience and skill. It happens that some junior members cannot follow a high level discussion and zone out. I believe this is a symptom of the group needing more knowledge sharing and communication in general. I guess this does not happen to experienced "mobs" but I do not know. As moderator I try to get all people engaged and ask the seniors to explain the problem to the whole team because "the team needs to know". I also move forward very slowly so everyone is able to get along. This is even more important in the end of the session as people get tired and zone out more easily.

Exhaustion
Due to their active nature "mob sessions" are quite exhaustive. Ten minutes of break each 50 minutes of work help to keep people fresh. Also regular rooms run out of fresh air quickly, so I open windows as much as possible during these breaks. Most meeting rooms I have been to have the projector on the shorter side of the room, so almost all people have to sit with their heads turned. This is uncomfortable and tiring when sitting like that for several hours. While there are no dedicated work areas for mobs, internal training rooms are better because they are usually arranged like classrooms. But I have yet to find the perfect mob working area in the enterprise.

Topics
In my Mob Programming sessions I want to show the team certain techniques, e.g. splitting a large class. The session needs a defined, explicit goal and everybody needs to be clear about his or her expectations. The format works well for showing a work-flow from end to end. Also people like sessions with big structural changes, e.g. refactoring towards a pattern. On the other hand they do not like extracting small structures, discussing names and other clean code related activities. People do not feel progress there. This might be a drawback of working with real code, the urge to make progress is higher than in a dojo. Maybe these sessions are only appropriate after the participants have gained a certain level of competence by attending Coding Dojos. Even with successful mob sessions I mix in a Coding Dojo every third time at least. In the dojo people work on the details and practise the basics. In some way mob sessions are orthogonal to dojos and as such a great addition to our learning tools.

Driving

I rotate drivers every ten to 15 minutes, as proposed by the Mob Programming guide to keep people active. Usually there are participants who do not want to type. These include junior team members, who probably are afraid to get criticised in front of the team, and on many occasions female developers. If someone does not want to drive, I ask why, but till now did not get any clear answer. I guess some people do not want to type because they are slow. Once a senior guy refused but then did drive. He was horribly slow and used no short-cuts at all. He even did not find the rename refactoring in the Eclipse menu. It was embarrassing.

Moderation
These kind of work is more demanding for me than dojos or even theory sessions. I rehearse my theory presentations and experiment with the planned kata before a dojo, but it is impossible to prepare fully for such a mob session. I ask the team to send me the code we will work with a few days before and then work through it in detail. Knowing the code up front makes me feel more secure and saves time because I do not have to read the code during the session. Such a session is team work and I do not have a strict outline except the defined goal. In fact I can never be sure about the right thing to do because it is not my code. I have no idea about its requirements, history and constraints. This is stressful because as mentor I am asked for guidance which I might not be able to give. It happens that I have to say that I do not know, which is not a problem but still makes me feel that I failed.

Conclusion
I am facilitating Mob Programming sessions since almost a year with different teams in different companies. Some worked out great, some did not. Some were fun, some were just exhausting. Should you try it? Maybe, I do not know. Do I recommend it? No, I stick with Woody's style of "not recommending", No I do not recommend it, I am just sharing because people have been asking about it. But I am genuinely interested in this mode of working and will continue to experiment with it. Unfortunately I never participated in a real session, working on production code. That would be another kind of experience. (Hint, hint! All you mobs out there, I am open for your invitations ;-)

3 September 2014

Thoughts on Mastery

During summer time I was on vacation in Tyrol which is an excellent place for hiking. A local magazine told a story about Josef Frauenschuh and his business of hand-crafted rakes. Josef is a true master craftsman: He is of old age, many years past official retirement. He creates large wooden rakes of highest quality using different kinds of wood for each part of the rake. His rakes are balanced and lightweight. He uses the best tools he can get, all of them are self-made and after 45 years he is still improving his tools and process. Obviously he does not accept any compromise. Once, when he needed more space to create the shafts, he simply made a hole in the wall of his workshop.

I liked the story a lot. It made me think about our craft and the concept of mastery.

I believe that one major aspect of mastery is age. It is not necessary by itself but rather is a side effect of the time needed to master a subject. Masters are aged because they have been practising their craft for 30 years or more and still try to improve. Following this definition it is obvious that there can not be many masters of software development, because our industry is still young. Only a few like Uncle Bob are working in the industry long enough, whereas most of us have less than ten years of experience. The so-called senior developers with five years of experience are just young journeyman, if at all.

Josef is working hard. Although he could retire any time, he is working five days a week because the demand for his rakes is high. He is a master and his rakes are masterpieces, but does he get rich? I do not think so. Following the story I assume he has to put considerable effort into each rake, and sometimes the wood splinters and he has to start over. Being curious I compared the prices and his rakes sell for around 70 Euro, whereas eBay has some for ten to 20 Euro, although no wooden ones. So yes, the masterpiece is up to five times more expensive than a regular one.

Still it seems that Josef is not earning five times more per hour than other workers. Could it be that master craftsmen are relatively underpaid because they put in extra work to create magnificent things? Maybe his rakes are expensive to compensate for the small number he is able to produce. That does not compare well to our industry. We (developers) are greedy. We expect a good salary matching our experience and competence. The more experience we have, the more money we want, which seems only fair. The never ending demand for IT professionals spoils us and salaries rise during the first years of junior developers' careers. At least they rise up to a certain point, which might be 15 years of experience, when we become "too senior" for most employers. But this is another story.

I do not know Josef Frauenschuh but I believe him to be a true master craftsman. Instead of bragging about his achievements, he is modest and admits that sometimes he even has to listen to outsiders to improve his rakes. He does not need titles of seniority, nor is he proclaiming himself to be a master. I guess he considers himself still a simple carpenter. He is a great example. Some of his modesty would suite us software people well, and I will start with myself.

31 August 2014

In the meantime

Oops, summer is almost over and there are no new blog posts. But as you can see, I have been busy...

Demolishing The Brick Wall

The picture does not show me but my wife, the whole family is working hard on the current project.

Dust In The Attic

Unfortunately the building is not "soft" and the planned refactorings are difficult to implement.

31 May 2014

Game of Life Apache Ant

This post is about the second half of my quest to create Conway's Game of Life using a plain Apache Ant build script. In the first half I test-drove the code up to target apply_rules which determined if a given cell would be reborn, live on or die. This part is about applying the rules of the game to whole generations. Here are the test cases to identify the next generation.

<target name="all-test-targets" description="run all test targets">
    ...
    <antcall target="test-next-gen-living-with-no-neighbours-dies" />
    <antcall target="test-next-gen-empty-with-three-neighbours-comes-alive" />
</target>

<target name="test-next-gen-empty-with-three-neighbours-comes-alive">
    <make-tmp-dir />
    <antcall target="create_3x3_grid" />
    <touch file="${test.temp_dir}/.y/.x.x/cell" />
    <touch file="${test.temp_dir}/.y.y/.x/cell" />
    <touch file="${test.temp_dir}/.y.y/.x.x/cell" />

    <antcall target="next_generation">
        <param name="p_cell_location" value="${test.temp_dir}/.y/.x" />
    </antcall>
    <assert-file-exists file="${test.temp_dir}/.y/.x/next" />
</target>

...

<property name="cell.marker_will_live" value="next" />

<target name="next_generation" depends="apply_rules" if="cell.will_live"
      description="creates a 'next' file if a cell in the given location
                   'p_cell_location' will be in the next generation.">

    <touch file="${p_cell_location}/${cell.marker_will_live}" />
</target>

The target next_generation marks a cell which will live (on) by creating an empty marker file inside its folder.

Center of the Milky Way Galaxy (NASA, Chandra, 111009)

The Universe is (maybe) infinite
In the previous part I defined the universe as a folder structure containing all cells.

universe/
    .y/
      .x/
      .x.x/
      ...
    .y.y/
      .x/
      .x.x/
      ...
    .y.y.y/
    ...

The universe is 2-dimensional and has a certain size. To support universes of arbitrary size I started testing for 2 times 2 grids,

<import file="lib/asserts.xml" />
<import file="universe-build.xml" />

<target name="test-create-empty-2x2-universe">
    <make-tmp-dir />

    <antcall target="create_universe">
        <param name="p_universe_location" value="${test.temp_dir}" />
        <param name="p_universe_size" value="2" />
    </antcall>

    <assert-file-exists file="${test.temp_dir}/universe/.y/.x" />
    <assert-file-exists file="${test.temp_dir}/universe/.y/.x.x" />
    <assert-file-exists file="${test.temp_dir}/universe/.y.y/.x" />
    <assert-file-exists file="${test.temp_dir}/universe/.y.y/.x.x" />
</target>

and then for 3 times 3, 4 times 4 and so on. The creation of each universe was straight forward, but there was a lot of duplication in the different creation targets. Creating an 1 by 1 universe was trivial and adding dimensions step by step seemed a good way to remove duplication.

<target name="create_universe_3" depends="create_universe_2">
    <antcall target="universe_add_dimension" />
</target>

<target name="create_universe_2" depends="create_universe_1">
    <antcall target="universe_add_dimension" />
</target>

<target name="create_universe_1" if="universe.location">
    <mkdir dir="${universe.location}/.y/.x" />
</target>

Increasing the grid's first dimension, the y- or row-dimension, of an existing grid is done by adding a new row folder .y.y.y. Adding the second, x- or column-dimension means adding another column .x.x.x to each row folder .y, .y.y and so on. In Ant this can be done using copy and regexpmapper adding another .x or, respectively .y to the matched group of .xs or .ys.

<macrodef name="universe_add_row"
      description="copies the row folders and renames them as additional row.">
    <sequential>
        <copy todir="${universe.location}" includeEmptyDirs="true" overwrite="false">
            <fileset dir="${universe.location}" includes="${universe.struct}" />
            <regexpmapper from="^(.*\.y)(/.*\.x)$$" to="\1.y/\2" handledirsep="yes" />
        </copy>
    </sequential>
</macrodef>

<macrodef name="universe_add_column"
      description="copies the column folders and renames them as additional column.">
    <sequential>
        <copy todir="${universe.location}" includeEmptyDirs="true" overwrite="false">
            <fileset dir="${universe.location}" includes="${universe.struct}" />
            <regexpmapper from="^(.*\.x)$$" to="\1.x" />
        </copy>
    </sequential>
</macrodef>

<target name="universe_add_dimension" if="universe.location"
      description="adds one dimension in column- and row-folders.">
    <universe_add_column />
    <universe_add_row />
</target>

That removed all duplicated mkdir commands in the creation targets. Still I was not happy with that solution because I needed to provide a target <target name="create_universe_N"> and all previous targets for each dimension N I wanted to support. I spent some time trying to resolve this but failed. Ant does not provide any kind of loop construct and does not allow for recursion when calling targets. (Using Ant-Contrib's extensions like For was out of the question because I wanted to write idiomatic Ant which is rather declarative than imperative.) So the current solution only supports grids up to a fixed dimension.

Generations
After creating universes of (almost) arbitrary size and seeding them I had to iterate all cells (folders) to update the whole universe at once. I did not need any new code, just a way to execute the cell-build.xml script inside each cell (folder) of the universe. After creating a (sort of integration) test for the desired functionality, I added a final target to the cell-build.xml that initialised the cell's location (p_cell_location) with the current build script's directory (${basedir}).

<target name="next_gen_for_basedir_cell"
      description="creates a 'next' file if the cell in the
                   basedir will be in the next generation.">

    <antcall target="next_generation">
        <param name="p_cell_location" value="${basedir}" />
    </antcall>
</target>

Then, after some experiments, I was able to use Ant's subant task to execute the logic for each cell inside each cell's folder.

<property name="universe.base_dir" value="universe" />
<property name="universe.struct" value="*/*" />
<property name="cell.build_file" value="cell-build.xml" />

<target name="next_generation" if="p_universe_location"
        description="applies the rules for next generation for each cell of
        the universe folder structure in location 'p_universe_location'.">

    <subant genericantfile="${cell.build_file}">
        <dirset dir="${p_universe_location}/${universe.base_dir}"
                includes="${universe.struct}" />
    </subant>
</target>

Including */* in the dirset selects only cells (the leaf directories) and no intermediate ones. I have never used subant before but I am still amazed by its power. It is a great tool to get rid of duplication or complicated series of targets. Even using an Ant extension with its loop would not produce such a concise way of iterating all cells.

The above code creates empty marker files (next) for all cells that will be alive in the next generation. All that is left is to delete the old generation (${cell.marker_alive}) and rename the new generation (${cell.marker_will_live}) to the current one.

<target name="evolve" depends="next_generation" if="p_universe_location"
      description="evolves the whole universe folder structure in location 'p_universe_location'.">

    <property name="universe.location" location="${p_universe_location}/${universe.base_dir}" />
    <delete dir="${universe.location}" includes="${universe.struct}/${cell.marker_alive}" />
    <copy todir="${universe.location}" includeEmptyDirs="false" overwrite="false">
        <fileset dir="${universe.location}" includes="${universe.struct}/${cell.marker_will_live}" />
        <regexpmapper from="^(.*)${cell.marker_will_live}$$" to="\1${cell.marker_alive}" />
    </copy>
    <delete dir="${universe.location}" includes="${universe.struct}/${cell.marker_will_live}" />
</target>

The final test is more an acceptance test than a unit test because it exercises all rules and functionality by evolving a whole universe which has been seeded a blinker.

<target name="test-evolve-updates-all-cells">
    <make-tmp-dir />

    <antcall target="create_universe">
        <param name="p_universe_location" value="${test.temp_dir}" />
        <param name="p_universe_size" value="3" />
    </antcall>

    <!-- seed blinker -->
    <universe_seed_cell location="${test.temp_dir}" row=".y.y" column=".x" />
    <universe_seed_cell location="${test.temp_dir}" row=".y.y" column=".x.x" />
    <universe_seed_cell location="${test.temp_dir}" row=".y.y" column=".x.x.x" />

    <!-- run GoL -->
    <antcall target="evolve">
        <param name="p_universe_location" value="${test.temp_dir}" />
    </antcall>

    <assert-file-does-not-exist file="${test.temp_dir}/universe/.y.y/.x/cell" />
    <assert-file-does-not-exist file="${test.temp_dir}/universe/.y.y/.x.x.x/cell" />

    <assert-file-exists file="${test.temp_dir}/universe/.y/.x.x/cell" />
    <assert-file-exists file="${test.temp_dir}/universe/.y.y/.x.x/cell" />
    <assert-file-exists file="${test.temp_dir}/universe/.y.y.y/.x.x/cell" />
</target>

It works! ;-) Here is proof in form of a little animation of the Ant Game of Life in action: Ant Game of Life Dir Listings

3 May 2014

The Life of (an) Ant

I already mentioned that Daniel impressed me by implementing Conway's Game of Life in MSBuild with MSBTest. Being a Java developer for most of my career, Apache Ant was the proper choice to build the cellular automaton using a build tool. I have worked with Ant before, so I was confident I would be able to pull it off.

Idiomatic Ant
I wanted a challenge and decided to avoid Ant extensions whenever possible. For example there are if and for tasks in Ant-Contrib which enable an imperative, procedural coding style. Obviously embedding any kind of script was also considered cheating.

An Ant's Universe
Wikipedia says that the universe of the Game of Life is an infinite two-dimensional orthogonal grid of square cells. How could I represent such a grid using basic Ant data structures? Which types does Ant support at all? Ant offers immutable properties and property sets, files, directories and paths and supports text file filtering using regular expressions. Which data structure could I use for the universe? How about directories? The universe is a two-dimensional grid, containing rows and columns, which might be defined by the names of nested folders. A single row in the grid is a folder which contains the folders of all its columns, e.g.

universe/
    row_folder_1/
      column_folder_1/
      column_folder_2/
      ...
    row_folder_2/
      column_folder_1/
      column_folder_2/
      ...
    row_folder_3/
    ...

That looks promising. But how to find the previous and following columns given a current column (folder) name? There are no arithmetic functions available in Ant, so there is no way to increase or decrease a number by one. But it is possible to append a character to a property creating a new one, allowing to reach the cells (folders) right and down of the current one, e.g.

universe/
    row_folder_x/
      column_folder_x/
      column_folder_xx/
      ...
    row_folder_xx/
      column_folder_x/
      column_folder_xx/
      ...
    row_folder_xxx/
    ...

To access the left and upper neighbours of a cell the name has to be shortened. Ant is not able to shorten String properties but the basename task is able to strip the extension off a file or directory name. So the row and column folders are called '.y' and '.x'.

universe/
    .y/
      .x/
      .x.x/
      ...
    .y.y/
      .x/
      .x.x/
      ...
    .y.y.y/
    ...

Testing Framework
To do it "right", I needed a testing framework. Although there is Apache AntUnit, I used a simpler solution to test my build files. Inspired by my former colleague Andy Balaam's Test-Driven Ant macros I created the necessary assertion macros on the way. For example the following code asserts that a certain property is set:

<macrodef name="assert-propery-set">
    <attribute name="propertyname" />
    <sequential>
        <fail message="property '@{propertyname}' not set."
              unless="@{propertyname}" />
    </sequential>
</macrodef>

Tell me about your neighbours
Having the data structure and assertions in place, it was time for the first test. I started with tests for counting the neighbours of a given cell, expecting its implementation in target count_neighbours of cell-build.xml.

<import file="lib/asserts.xml" />
<import file="cell-build.xml" />

<target name="test-count-neighbours-in-middle-when-no-neighbours">
    <make-tmp-dir />
    <antcall target="create_3x3_grid" />
    <touch file="${test.temp_dir}/.y.y/.x.x/cell" />

    <antcall target="assert-number-of-neighbours">
        <param name="p_cell_location" value="${test.temp_dir}/.y.y/.x.x" />
        <param name="p_expected_neighbour_count" value="0" />
    </antcall>
</target>

<target name="create_3x3_grid" if="test.temp_dir">
    <mkdir dir="${test.temp_dir}/.y/.x" />
    <mkdir dir="${test.temp_dir}/.y/.x.x" />
    <mkdir dir="${test.temp_dir}/.y/.x.x.x" />
    <mkdir dir="${test.temp_dir}/.y.y/.x" />
    <mkdir dir="${test.temp_dir}/.y.y/.x.x" />
    <mkdir dir="${test.temp_dir}/.y.y/.x.x.x" />
    <mkdir dir="${test.temp_dir}/.y.y.y/.x" />
    <mkdir dir="${test.temp_dir}/.y.y.y/.x.x" />
    <mkdir dir="${test.temp_dir}/.y.y.y/.x.x.x" />
</target>

<target name="assert-number-of-neighbours" depends="count_neighbours">
    <assert-propery-equals propertyname="cell.neighbour_count"
                           expected="${p_expected_neighbour_count}" />
</target>

First I fulfilled the test with a fake solution.

<target name="count_neighbours">
    <property name="cell.neighbour_count" value="0" />
</target>

After eight more tests,

<target name="all-test-targets" description="run all test targets">
    <antcall target="test-count-neighbours-in-middle-when-no-neighbours" />
    <antcall target="test-count-one-neighbour-left" />
    <antcall target="test-count-one-neighbour-right" />
    <antcall target="test-count-one-neighbour-up" />
    <antcall target="test-count-one-neighbour-down" />
    <antcall target="test-count-one-neighbour-up-left" />
    <antcall target="test-count-all-neighbours-around" />
    <antcall target="test-count-neighbours-in-left-up-when-no-neighbours" />
    <antcall target="test-count-neighbours-in-lower-right-when-no-neighbours" />

count_neighbours was finished.

<property name="cell.marker_alive" value="cell" />

<target name="count_neighbours" if="p_cell_location"
      description="sets property 'cell.neighbour_count' to the count of living
                   neighbours of a cell in the given location 'p_cell_location'.">

    <dirname property="cell.base_row" file="${p_cell_location}" />
    <dirname property="cell.base" file="${cell.base_row}" />

    <pref_self_next dim="y" folder="${cell.base_row}" /> <!-- (1) -->
    <pref_self_next dim="x" folder="${p_cell_location}" /> <!-- (2) -->

    <resourcecount property="cell.neighbour_count"> <!-- (3) -->
        <fileset dir="${cell.base}">
            <include name="${cell.prev_y}/${cell.prev_x}/${cell.marker_alive}" />
            <include name="${cell.prev_y}/${cell.self_x}/${cell.marker_alive}" />
            <include name="${cell.prev_y}/${cell.next_x}/${cell.marker_alive}" />

            <include name="${cell.self_y}/${cell.prev_x}/${cell.marker_alive}" />
            <include name="${cell.self_y}/${cell.next_x}/${cell.marker_alive}" />

            <include name="${cell.next_y}/${cell.prev_x}/${cell.marker_alive}" />
            <include name="${cell.next_y}/${cell.self_x}/${cell.marker_alive}" />
            <include name="${cell.next_y}/${cell.next_x}/${cell.marker_alive}" />
        </fileset>
    </resourcecount>
</target>

Line (1) defines the properties for the previous, current and following row, i.e. in the y dimension and line (2) does the same for columns, i.e. in the x dimension. Both sets of properties are similar, so I extracted a macro to remove duplication. (Remember: Red - Green - Refactor)

<macrodef name="pref_self_next"
      description="defines properties 'cell.self_X', 'cell.pref_X' and 'cell.next_X' for
                   given dimension/orientation 'dim' from center of current 'folder'.">

    <attribute name="dim" />
    <attribute name="folder" />

    <sequential>
        <basename property="cell.self_@{dim}" file="@{folder}" />
        <property name="cell.next_@{dim}" value="${cell.self_@{dim}}.@{dim}" />
        <basename property="cell.prev_@{dim}" file="@{folder}" suffix=".@{dim}" />
    </sequential>
</macrodef>

After determining the names of the previous and following neighbouring cells (folders), counting the live ones (files) was be done with Ant's resourcecount. Line (3) counts the number of files called 'cell' in the neighbouring folders.

To Be or Not To Be
With the number of living neighbours I was ready to implement the rules of the game right away. The target apply_rules uses this number to determine if a cell lives on or dies. With four tests,

<target name="all-test-targets" description="run all test targets">
    ...
    <antcall target="test-rules-with-two-neighbours-lives-on" />
    <antcall target="test-rules-with-one-neighbour-dies-on" />
    <antcall target="test-rules-empty-with-two-neighbours-stays-dead" />
    <antcall target="test-rules-empty-with-three-neighbours-comes-alive" />

apply_rules evolved to

<target name="apply_rules" depends="count_neighbours" if="p_cell_location"
      description="sets property 'cell.will_live' only if a cell in the given
                   location 'p_cell_location' will live on for the next generation.">

    <condition property="cell.will_live">
        <or>
          <and>
              <available file="${p_cell_location}/${cell.marker_alive}" />
              <equals arg1="${cell.neighbour_count}" arg2="2" />
          </and>
            <equals arg1="${cell.neighbour_count}" arg2="3" />
        </or>
    </condition>
</target>

Target apply_rules expects the property p_cell_location to point to a given cell (folder) in the universe. It first calls the target count_neighbours via depends and uses the cell.neighbour_count to determine if the given cell will be reborn, live on or die. This implements the rules of the game and ends this post. The next one will use apply_rules to run a full Game of Life on arbitrary universes. Stay tuned!

18 April 2014

How to divide learning time

As software delivery professionals we constantly need to learn and practice to keep our skills sharp. Last year I even spent three months pair programming to do exactly that. Possible activities required for learning and related to the sharing of knowledge are:

(a) studying, e.g. reading technical books or blog posts, watching (recorded) presentations as well as any form of consuming knowledge.
(b) practising, e.g. doing code katas, attending Coding Dojos or working on fun (but otherwise useless) projects.
(c) experimenting, which includes creating prototypes or trying out new frameworks or ideas.
(d) creating something real, e.g. hobby projects or maybe contributing to Open Source.
(e) documenting and sharing, e.g. creating how-to documents or writing blog posts.
(f) teaching, e.g. preparing presentations and talking at user group meetings.

Why are documenting, sharing and teaching considered learning activities?
Obviously sharing is no learning activity by itself but the process of analysing and describing things improves our understanding similar to Rubber Ducking. For example writing an article about some topic is a great way to bring one's thoughts into order and learn more about it. Documenting and sorting knowledge is also part of PIM (Personal Information Management) which is related to learning as well. Teaching is considered a learning activity by most people. Teaching is a great way to perfect the understanding of an idea. This is sharing on steroids as you are forced to dig deep into the topic to be able to explain it to others.

How should I divide my learning time into activities (a) to (f)?
With the start of 2014 I came back to (sort of) regular work and obviously my personal time for learning became limited again. I was not sure if I spent the available time in the most efficient way and looked for guidelines how to divide my learning time into these activities. All activities (a) to (f) seem vital, but likely do not have the same priorities. For example there are so many important books to read, I could spend all my time on reading alone, which does not feel right to me. Sören Kewenig pointed out that items (a) to (f) might be seen as the phases of a project: a -> [c,d,b] -> e -> f, which also indicates that all these activities are important and related.

This question is subjective ;-)
As usual when I have a question I ask StackOverflow. In this case I asked on Programmers. Unfortunately the question was closed in less than 19 minutes as being off-topic. (Many of my questions I ask on StackOverflow get closed but 19 minutes is my personal record so far ;-) While it is true that learning is a subjective process, there might be some research that gives clear advice how to optimise learning activities.

Comments suggest that practice is the key.
Nevertheless I was able to harvest at least some comments from my StackOverflow question. Robert Harvey wrote "if you're not writing code that illustrates the principles being taught, they won't stick. [...] I am saying that, without practice, the other stuff won't matter." I conclude that there needs to be a strong focus on (b) or (d) until the newly acquired material is incorporated into the daily work and applied every day. Then I asked my fellow craftsmen of the Softwerkskammer, the Software Craftsmanship Communities in Germany, Austria and Switzerland. Sören answered me that he used 50% of his free time for Coding Dojos (b) and 50% for working on Open Source (d). Jan Sauerwein spends his time on code katas (b) and preparing a lecture (f). Franziska said that she liked learning with others and therefore favoured (b), (c) and (f). Ulrich Merkel said that developing skills needs as much practice as possible - (b), (c) or (d), but probably he was talking about real work - together with discussion and feedback as corrective. So all answers agree that practice, either by real work or (b), (c) or (d) is important.

How I divided my learning time till now
Franziska asked me how I divided my learning time till now. Yes I should have checked my own time before asking ;-) So I started tracking my learning time. For several weeks I spent an average of 10 hours on learning activities.

(a) studying: 20%. I spent some time reading. I am participating in an online book club which forces me to keep up reading and allows for discussions afterwards. I read the book while commuting. Further I browse my feeds of blogs and news to get an overview about what's going on and to stay aware of new trends.
(a) studying: 30%. I watched presentations. This includes user group presentations as well as recorded ones.
(b) practising: 20%. Once a week I meet online with a friend to do remote pair programming. We schedule the appointment up to one month in advance, to make sure it even happens when workload is high - and it usually works.
(d) creating stuff is still part of my job, although recently I am not doing as much development as I used to do.
(e) documenting: 10%. I used some time to look up slides of seen presentations and to take notes about what I have learned.
(e) sharing: 20%. An average of two hours went into writing blog posts, which was not enough to publish a full article but I worked on fragments of posts which eventually will be published.
(f) teaching is part of my new profession and I prepare a presentation now and then. Sometimes I also present something at user group meetings.

This looks pretty balanced, there are no huge gaps. I am surprised how much I do read because I always feel like I should read more but then I am too tired most evenings. Maybe I should watch more demonstrations though. Also 20% practice might not be enough compared to the previous comments by Robert and Sören.

The Learning Pyramid
Not all learning activities are equally effective. For example I remember a vivid discussion about something much better than just reading about it. Edgar Dale summarises these findings as Learning Retention Rate which ultimately leads us to the Learning Pyramid. In the Learning Pyramid, the learning activities are ordered by how much they pay off.

Learning Retention Rate % by Edgar Dale

When looking at these rates I see that I need to add another activity to my list: discussion, which might happen during user group meetings, practice with others or teaching. So let me revisit my tracked learning time and distribute it according to the pyramid.

Lecture, 5% retention. Next to real lectures that I sometimes attend, I consider some user group meetings to be in this category. Sometimes presentations are of mixed quality and there is not much discussion. 15% of my learning time (a) goes here.
Reading, 10% retention. I spend 20% of my time reading (a).
Audio-Visual, 20% retention. To be fair, not all presentations I see are boring, so I consider some of them to be of this category. Here I spend another 20% of my time from (a).
Demonstration, 30% retention. It is difficult to assign one of these categories to a typical user group talk or conference presentation. Some presenters give high quality demonstration, but I see this rarely, maybe 5%.
Discussion Group, 50% retention. Especially in the reading group (a) there is much discussion. Maybe 10% of my time is of this category.
Practice by doing, 75% retention. With two hours of pair programming code katas (b) each week I score 20% here.
Teaching others, 90% retention. As I teach as part of my day job (f), I did not track the time. Also I am left with two hours documenting and blogging (e) which do not fit into the pyramid. To make things easier I will just award myself honorary 10% here.

When multiplying my learning time by the retention rate I end up with a weighted learning time, let's call it my "effective" learning time:
My Effective Learning Time %

Lecture 2%,
Reading 5%,
Audio-Visual 10%,
Demonstration 4%,
Discussion 14%,
Practice 40%,
Teaching 24%.

How I (maybe) should divide my learning time
It is time for a conclusion. I have spent some time analysing the topic - and this made some things clear to me - call it meta-learning. I do not know if the final distribution of my effective learning time is well balanced or not. On one hand the activities with smaller retention rate obviously do not pay off, so there is no point in doing more of them. On the other hand - maybe - there would be some benefit in spreading the learning across all activities. As the retention rates (5, 10, 20,...) look (sort of) exponential, the effective learning would also look exponential for equally distributed learning times.

Now I see that my feeling about not reading enough comes from the fact that I do not learn enough while reading, which is obvious when looking at my effective learning time for reading. Even if I would add another hour of reading each week, it would not make much difference. But I still believe that it is important to read technical books, because it is a good way to cover a topic in depth. So I will stick to my reading routine of one to two hours each week. I will continue watching presentations, because I usually watch them during cardio-workout. I will add demonstrations, but I am not sure where I could find these. Maybe I will look for user group meetings with live coding. I am doing a lot of practice already, and it is dominating my learning. I will have to think about this. Just half an hour moved from practice to teaching each week might improve my effective learning. So even if I give presentations as part of my day job, I will look for user groups willing to listen to me ;-)

As others already pointed out, different people have different learning styles. Learning styles might even change for the same person over time. I am very interested in your take on dividing learning time and how you think the few hours of personal learning could be spent best.

25 March 2014

Maven Site using Markdown

I like Markdown. It is a popular lightweight markup language, easy to read and easy to write. It is supported by many websites and text editors (and recently got standardised). I use it since many years and whenever I take notes, even in Notepad or on my mobile phone, I structure my text using at least I use * to list items and --- to separate paragraphs. Recently when I had to publish some documentation for a project, I was not surprised to find my existing notes in Markdown. So why not generate the final document from them?

Newley Marked Down Misspelled Sale Sign Fail

Apache Maven can be used to build a project site. If you are a Java developer you probably have seen some of these sites already. But the auto generated Maven reports are not much use if you want to add real documentation. The recommended formats to create content are APT or Xdoc. While the APT format is similar to Markdown, I did not want to convert my documents and went off tweaking Doxia for Markdown. This took me extra two hours, but hey - work is supposed to be fun sometimes too, right?

Maven Doxia is the content generation framework used by the site plugin. It supports Markdown since Doxia 1.3. The site content is separated by format and Markdown files need to be inside a markdown folder and named *.md.

project
|-- pom.xml
`-- src
    |-- main
    |-- test
    `-- site
        |-- markdown
        |   `-- Readme.md
        `-- site.xml

To enable markdown processing in Maven site the doxia-module-markdown has to be added to your pom.xml.

<build>
  <plugins>
    <plugin>
      <groupId>org.apache.maven.plugins</groupId>
      <artifactId>maven-site-plugin</artifactId>
      <version>3.2</version>
      <dependencies>
        <!-- add optional Markdown processor -->
        <dependency>
            <groupId>org.apache.maven.doxia</groupId>
            <artifactId>doxia-module-markdown</artifactId>
            <version>1.4</version>
        </dependency>
      </dependencies>
    </plugin>
  </plugins>
</build>

During mvn site all Markdown files are translated to HTML and put into the target/site folder. To add them to the menu of the generated site, you have to add links to the decoration descriptor of Doxia, also known as site.xml.

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/DECORATION/1.4.0"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://maven.apache.org/DECORATION/1.4.0
                        http://maven.apache.org/xsd/decoration-1.4.0.xsd">

    <body>
        <menu name="Documents">
            <item name="Readme" href="Readme.html" />
        </menu>

        <menu ref="reports" />
    </body>

</project>

Done. Success. Profit. ;-)

21 February 2014

Assert or MatcherAssert

I regularly meet online with friends and practice remote pair programming. Last time, while working with my friend Thomas, one of us mistyped the static import for assertThat and we ended with an import of org.hamcrest.MatcherAssert instead of org.junit.Assert. I had never seen MatcherAssert before and Thomas asked me if I knew what the actual difference between them would be. I did not know but we are going to find out right now.

A little bit of history
The Hamcrest Matchers are part of JUnit 4 since version 4.4. JUnit up to version 4.10 shipped with Hamcrest 1.1 and on release 4.11 it switched to Hamcrest 1.3. Looking at the history of MatcherAssert it seems that it had been in the core already but had been moved out again before version 1.1 was released.

May 22, 2007 - Moved assertThat() into hamcrest-core.
May 23, 2007 - Moved assertThat() back into hamcrest-integration (really this time).
July 19, 2007 - JUnit 4.4 shipped including Hamcrest 1.1
November 25, 2008 - Moved MatcherAssert to core.
July 9, 2012 - Hamcrest version 1.3 released
November 14, 2012 - shipped JUnit 4.11

Show me the code!
Now let us compare the actual source code. JUnit's Assert.assertThat looks like

public static <T> void assertThat(T actual, Matcher<T> matcher) {
   assertThat("", actual, matcher);
}

public static <T> void assertThat(String reason, T actual,
                                    Matcher<T> matcher) {
   if (!matcher.matches(actual)) {
      Description description = new StringDescription();
      description.appendText(reason);
      description.appendText("\nExpected: ");
      description.appendDescriptionOf(matcher);
      description.appendText("\n     got: ");
      description.appendValue(actual);
      description.appendText("\n");

      throw new AssertionError(description.toString());
   }
}

whereas Hamcrest's MatcherAssert code is

public static <T> void assertThat(T actual, Matcher<? super T> matcher) {
   assertThat("", actual, matcher);
}

public static <T> void assertThat(String reason, T actual,
                                    Matcher<? super T> matcher) {
   if (!matcher.matches(actual)) {
      Description description = new StringDescription();
      description.appendText(reason)
                 .appendText("\nExpected: ")
                 .appendDescriptionOf(matcher)
                 .appendText("\n     but: ");
      matcher.describeMismatch(actual, description);

      throw new AssertionError(description.toString());
   }
}

public static void assertThat(String reason, boolean assertion) {
   if (!assertion) {
      throw new AssertionError(reason);
   }
}

assertThat(String, boolean)
The most obvious difference is that Hamcrest defines an additional assertThat method which accepts a boolean expression. The method is identical to Assert.assertTrue(String, boolean). No big deal.

<? super T>
The signatures of both classes are almost identical. Almost because Hamcrest has a variation of the generic type T. JUnit accepts matchers that have the same type as the actual value in the argument list T actual, Matcher<T> matcher, while Hamcrest also accepts them for super types of T in T actual, Matcher<? super T> matcher. Allowing parent types of T makes sense because what matches the parents of T will always match T too. Here is an example of a Matcher<Number> matching against an Integer which extends Number.

import org.hamcrest.CustomMatcher;
...

@Test
public void shouldAllowSuperTypeMatcherInAssert() {
   Integer actual = 3;
   Matcher<Number> matcher = new CustomMatcher<Number>("anything") {
      @Override
      public boolean matches(Object anything) {
         return true;
      }
   };
   MatcherAssert.assertThat(actual, matcher); // (1)
   Assert.assertThat(actual, matcher); // (2)
}

Line (1) binds <T> to <Integer> because <? super Integer> allows Number as type of the given matcher. On the other hand, line (2) compiles as well, setting <T> directly to <Number>. So the two signatures are equivalent.

Description of the Mismatch
The third difference is the way the two assertions describe the mismatch. While JUnit just appends the actual value with description.appendValue(actual), Hamcrest asks the matcher to describe its mismatch by matcher.describeMismatch(actual, description), which might give more information in case of failure. So let us dig into some more code.

@Test
public void shouldDisplaySimilarMessageForIsMatcher() {
   int actual = 1;
   Matcher<Integer> matcher = is(2);
   assertJUnitMessage(actual, matcher,
      "Message\nExpected: is <2>\n  got: <1>\n");
   assertHamcrMessage(actual, matcher,
      "Message\nExpected: is <2>\n  but: was <1>");
}

As the BaseMatcher's describeMismatch just returns "was " + actual there is no real difference in the output of both assertions. What about other matchers? Let us look for implementations of describeMismatch that do more than just return the actual value. The only matcher I found was org.hamcrest.TypeSafeMatcher, which has several subclasses in the Hamcrest library, e.g. HasProperty, IsEmptyIterable and IsCloseTo.

import org.hamcrest.beans.HasProperty;
import org.hamcrest.collection.IsEmptyIterable;
import org.hamcrest.number.IsCloseTo;
...

@Test
public void shouldDisplaySimilarMessageForHasPropertyMatcher() {
   Object actual = "abc";
   Matcher<Object> matcher = HasProperty.hasProperty("prop");

   assertJUnitMessage(actual, matcher,
      "Message\nExpected: hasProperty(\"prop\")\n  got: \"abc\"\n");
   assertHamcrMessage(actual, matcher,
      "Message\nExpected: hasProperty(\"prop\")\n  but: no \"prop\" in \"abc\"");
}

@Test
public void shouldDisplaySimilarMessageForIsEmptyIterableMatcher() {
   Iterable<String> actual = Collections.<String> singletonList("a");
   Matcher<Iterable<String>> matcher = IsEmptyIterable.<String> emptyIterable();

   assertJUnitMessage(actual, matcher,
      "Message\nExpected: an empty iterable\n  got: <[a]>\n");
   assertHamcrMessage(actual, matcher,
      "Message\nExpected: an empty iterable\n  but: [\"a\"]");
}

@Test
public void shouldDisplaySimilarMessageForIsCloseToMatcher() {
   double actual = 2.0;
   Matcher<Double> matcher = IsCloseTo.closeTo(1, 0.1);

   assertJUnitMessage(actual, matcher,
      "Message\nExpected: a numeric value within <0.1> of <1.0>\n  got: <2.0>\n");
   assertHamcrMessage(actual, matcher,
      "Message\nExpected: a numeric value within <0.1> of <1.0>\n  but: <2.0> differed by <0.9>");
}

Hamcrest creates slightly more detailed error messages, but only for these three cases.

Conclusion
org.hamcrest.MatcherAssert is not a replacement for org.junit.Assert because it does not come with all the assertions which we are used to. But MatcherAssert contains a slightly different assertThat. Using that method could potentially give better error messages because the matcher is called to describe the mismatch. When using a complex, custom matcher this could improve the error messages and speed up error diagnosis. Currently only three matchers implement their own descriptions. These come with the Hamcrest library which is not included with JUnit right now.

6 February 2014

Write about what you do

This is the third part of my series about personal branding. It started with some advice for a friend but the content kept growing and growing. This is supposed to be a step by step guide for software developers to build their own personal brand. The first steps are branding all accounts and defining a motto. The next steps to get some reputation are sharing content using Twitter and contributing to Stack Overflow. Now it is time to talk about writing.

Maintain a Technical Blog
Blogging is a huge area and many things have been written about it. I am no social media expert nor a professional blogger so I recommend you first do some research about how to write blog posts. Start with Scott Hanselman's excellent article how to Keep Your Blog from Sucking. And yes, I feel badly promising information about blogging and then sending you STFW. Instead I will only comment on a few rules you will find.

Blogging Frequency
I have seen advice to create at least eight blog posts a month, or even more, to keep readers engaged and attract new ones as well. But let's face it, blogging is hard work. Even if you are a natural writer you still need to collect the information, maybe add code samples, put it all together and format it nicely. This takes time and unless you are writing blog posts on company time, you likely do not have that much spare time. Compared to sharing on Twitter, blogging is more heavy-weight and therefore more likely to be postponed. If you are disciplined you could set up a time in your weekly schedule to work on the next blog post. I am not and I barely manage to write two posts per month. I keep telling myself that a blog posts needs as much time to write as it needs, there is no point in forcing it. I would rather not publish anything than a poorly written or badly formatted post.

Content is King
Obviously content is king, so what should you write about? If there is something you want to say, just write it. While being authentic, you write mainly for yourself. For example write a rant to get over something that made you angry. (But blog rants should not be most of your writing.) Next write for your readers and for search engines. Dan North once said: "If you did something and it worked, then write about it. If it did not work, write about it as well, you never know how other people will use it." Following this advice creates the most useful articles which share insight or teach new skills. Especially how-to posts are helpful to people and get a lot of traffic from search engines.

Some other things you could write about are recaps of conferences or events, reviews of books or a little fun post from time to time. A great source for blog content is your mailbox. Some of your email-conversations do not need to be private and might be useful for other people as well. In such a case just write a blog post. Even if you need to write the email, you can always reuse the content and put it on your blog, too. I used this technique while working for a past employer. Whenever I wrote some technical email or wiki page, I considered reusing the content on my blog. For example here is an email I wrote to the whole team some time ago. Companies like IBM even encourage you to cross post internal content to your public blog, but you really need to be careful about sharing internal information.

Size Does Matter
More content is not always better. I got feedback from one of my readers that he likes my posts because they are short and he is able to read them quickly. Although I do not mind longer articles myself, I do not read them immediately when checking my feeds. They stay in the list of articles to read, but this list is ever growing and there is a probability that I might never read them at all. So a total of 500 words seems to be a reasonable size for a blog post. 500 words equal 3500 to 4500 characters, which is exactly the average size of my blog posts and this one is at this very sentence crossing the boundary.

Focus on Quality
As while sharing, while writing never focus on the number of followers. There are plenty of articles in the web promising you to attract many followers. Do not waste your time with these. Instead care for the quality of your blog, e.g. come back to update your blog in case of mistakes. Also focus on the overall interaction. Answer comments timely and follow up on questions.

Write a Technical Article
If you enjoy writing, you could consider writing full blown articles for technical magazines or developer portals. These reach a different audience and are more reputable because they are real publications considered in an academic sense. But they are also much harder to create. First your topic needs to fit the current theme of the magazine. Not all ideas are welcome, if the editor thinks your topic is not of interest to many readers, your proposal will be rejected. On the other hand, on your blog you can publish whatever you like. So given your area of specialisation or interest, finding a publisher might be difficult. Publishers have strict guidelines for the content and (the ones I worked with) expect at least 2500 words per article, so five times more work than a regular blog post. You might have to revise your article several times until it is accepted. The articles I wrote some years ago took me between 20 and 30 hours each to get them published.

That ends my description of the basic steps you should take to build your own personal brand. There are some more advanced topics you could do, e.g. public speaking, but I will leave them for another time.

3 February 2014

Gain Reputation by Sharing

In the previous article I talked about the foundation of your personal brand, your reputation. Reputation means to be recognized for something, at least by your peers. But to be known for something you actually have to do something, e.g. create or share things of interest.

So Just Start Sharing
The easiest way to share is to re-share. Start using Twitter and follow some people related to your interests. These may be authors of books you consider important or well known members of the community or just people you know and want to listen to. If they share something you consider worth reading, then re-share (re-tweet) it. Also share links whenever you find an interesting article or useful tool. There exists a bookmark widget that makes sharing to Twitter very easy. If you do all that then I am already interested in your Twitter stream because you aggregate information about a certain topic and reduce noise.

Do Not Over-share
Start slowly and do not over-share. Be sensitive about what you share. For example political or religious topics might be controversial and will not help you - unless that is what you want to be known for. I especially like the rules given by Martin Nielsen: "Would your (hypothetical) grandchildren be proud of what you share? If not then probably you should not share it. And if you do not smile while writing it, you should not share it at all." I have violated these rules myself a few times as you can see in my blog rants, but exceptions prove the rule. ;-)

Promote Yourself
Many social media tools, like Twitter, Facebook or Google+ allow you to promote content by sharing links or snippets of text. Using these tools you can and should promote yourself and tell the world what you are doing to build your own brand. But if all your content is (self)-marketing it will not work. People will lose interest because there is no benefit in following you. While working for IBM I had the opportunity to listen to a presentation by Luis Suarez, the email-less man, about social media and he advised that eight out of ten messages should not be (self)-marketing, but interesting and useful information.

Engage Others and Communicate
After some time, when you feel comfortable with sharing interesting things via social media tools, it is time to communicate directly with others. This can be a simple as answering tweets or commenting on blog posts you read. Saying thanks or asking questions is a great way to start a conversation. This is exactly how I got in touch with the great people organizing my most favourite conference. While preparing to visit GeeCON I shared some ideas for the community day on the GeeCON blog. It was not a big thing, just one or two sentences. Then during the conference, when I passed Adam of the GeeCON staff, he stopped and we had a nice chat. He had recognized the Code Cop logo on the sleeve of my T-shirt. (By the way, GeeCON 2014 is coming up and they have great speakers announced, so you might consider going there.)

Be Authentic (Again)
I already mentioned in the first part that you need to be authentic. If you have nothing to say then do not try to make up fake conversations. If you read a blog post and do not have anything to add or discuss, then just leave it alone. I rarely add comments to blog posts. If I have nothing to say, I just shut up.

Create Genuine Content
After sharing and commenting the next step is creating stuff on your own, which usually means maintaining a programmer blog. While this is true I recommend something easier to start: Stack Overflow. I am sure that you use Stack Overflow. So why not register an account and start contributing? Even if you are just reading answers, you should vote on questions and answers that are useful to you. Then you could make a habit of fixing badly formatted questions and (given some reputation) adding tags. Sooner or later you will have some reasonable question to ask or even know some answers to existing questions. Stack Overflow is an excellent training area for future writers because questions and answers should be correct, concise and properly formatted, just like your blog posts, only much shorter. See a recent presentation by Jeff Atwood about how to be a good Stack Overflow citizen for details on how to work best with Stack Overflow.

The next part will be on maintaining a technical blog. While I am busy writing about it, I expect a lot of interesting material to appear in your Twitter stream.