22 March 2013

Twitter Command Line Backup

I used to be a low volume Twitter user. I would not connect with people having more than 1000 tweets because they seemed to "high-volume" for me. But given time the number of my tweets rose as well and I managed to create a decent tweet now and then.

U.S.-Bundeswehr medical trainingSaving My Tweets
As soon as I noticed that my tweets were "piling up", I thought about backing them up. This would have to work incrementally as older tweets might vanish from my time line. I searched for an online tool, but did not find anything useful. While searching for offline tools, I found TwitterBackup by Johann Burkard. It is a great tool and does exactly what I need, incremental backup of tweets. It has a small user interface and works out of the box. I recommend it if you like using UIs. (Note that the password input field in the main window is not used, you do not have to provide your password there.)

GUIs are for wimps ;-)
I prefer to start my tools from the command line, especially if I plan to run them periodically. Fortunately Johann Burkard provided his TwitterBackup under a MIT license, so I forked the source and "mavenized" the project. Johann had kept his logic separated from the UI which made it easy to remove the user interface and add command line parameters instead. A few days later I added support to backup favorites and retweets as well. The current version supports the following commands:
E:\>java -jar twitter-backup-cli-3.1.8.2-jar-with-dependencies.jar -h
usage: twitter-backup-cli [-u <twitter handle>] [-f <backup file>]
Backup Twitter Tweets with TwitterBackup (command line).
 -f,--file <arg>         File to save tweets to (saved to system)
 -fv,--favorites         Load favorites instead of tweets (default=tweets)
 -h,--help               Print this usage information
 -o,--port <arg>         HTTP proxy port for web access
 -p,--proxy <arg>        HTTP proxy URL for web access (saved to system)
 -r,--reset-preferences  Do not load preferences from system
 -rt,--retweets          Also load retweets in timeline (default=false)
 -si,--sinceId <arg>     Load tweets/favorites since the given id (default=all)
 -t,--timeout <arg>      Timeout in ms between calls to Twitter (default=10500)
 -u,--username <arg>     Twitter username to load tweets from (saved to system)
For general instructions, see Johann's website at: http://is.gd/4ete
Note that both the original and the new version save some of their parameters to the java.util.prefs.Preferences, so you need to provide your credentials only once.

Usage Patterns
Download the binary Jar from my Maven repository and provide your twitter handle, e.g. -u codecopkofler. Use -f tweets.xml -rt to backup all tweets and retweets and use -f favorites.xml -fv to save all your favorites. Finally decide where the backup should start, e.g. -si 286077755567779841. The value 286077755567779841 is Twitter's id of my first tweet in the year 2013, whereas 152504278093795328 was my first in 2012 and so on. With -si I separate my tweets into yearly backup chunks.
Loading Tweets
read 27 tweets from tweets.xml
loading http://api.twitter.com/1/statuses/user_timeline.xml?...
9 new tweets downloaded, 36 total
waiting 10500ms
loading http://api.twitter.com/1/statuses/user_timeline.xml?...
no new tweets found
saving backup to tweets.xml
Loading Favorites
read 0 tweets from favorites.xml
loading http://api.twitter.com/1/favorites.xml?...
1 new tweets downloaded, 1 total
waiting 10500ms
loading http://api.twitter.com/1/favorites.xml?...
no new tweets found
saving backup to favorites.xml

12 March 2013

Complexity Slope

GeeCON is a great conference with many good presentations. Last year, a talk by Keith Braithwaite on Measuring the Effect of TDD on Design particularly piqued my interest. He talked about Cyclomatic Complexity and that code with tests is more strongly biased towards simpler methods than code without tests. Keith started this interesting research in 2006 and you can read everything about it in a series of blog posts about complexity and test-first. Unfortunately there is no recording of the session Keith gave at GeeCON, but he gave a similar talk at QCon 2008. I will not repeat his research here, so you should read the series of posts or at least watch the recording to continue.

Complexity Slope of Apache Harmony 1.6M3Measure It!
Keith wrote a tool called Measure to harvest the distribution of Cyclomatic Complexity throughout a code base. It is based on Checkstyle and analyses Java source code. It worked out of the box, so I gave it a try.

Some Numbers
My current project, a large RCP application, has a complexity slope as bad as 1,75. It has really bad code, there are many large and complex methods and no tests at all. (There were no tests when I joined the project last year. Now with me on the project, the number of tests is growing slowly but steadily ;-) I am wondering if the used framework has an impact in the complexity slope as well. RCP is known for its complex dependency structure by its overuse of the singleton patten.

Another project that I worked on some years ago was a large web application and it was probably as bad as the current one. After five years of heavy refactoring and retrofitting with unit tests up to 60% code coverage, it scored almost two (1,99). It seems to be a special case. The application was never test-first and as such should score below two. The refactoring activities did not target complexity but were guided by simple metrics and layering rules.

Usage
If you want to see the complexity slope of your projects download measure-0.3.1.zip and unpack it. It contains everything you need to run Measure. There is a bash for Linux and a I added a batch for Windows. If you run one of them they will print a short help message. I ran it against my little BDD testing framework and it showed me a nice score of almost three (2,94). This was expected as I had used strict TDD while developing it. After my latest refactoring where I reduced the complexity of the story parser considerably, it even scored beyond three (3,12).

Complexity Slope of BaDaDam Testing-Framework
What about Ruby?
Keith's Measure supported Java by using Checkstyle under its hood. In fact it did little more than parse the Checkstyle complexity report with a threshold of zero, so reporting all complexity values. I used Saikuro, a Ruby Cyclomatic Complexity Analyser to implement the same for Ruby. First I created a Saikuro Rake task and set Saikuro's complexity threshold to zero,
state_filter = Filter.new(0)
  state_formater = Saikuro::StateHTMLComplexityFormater.new(STDOUT, state_filter)
  ...
  idx_states, idx_tokens = Saikuro.analyze(@files, state_formater, nil, @output_dir)
  write_cyclo_index(idx_states, @output_dir)
Then I ran Saikuro to generate the complexity report into some folder,
rake -f ruby\saikuro_task.rb dir=some_folder
added a Saikuro parser to Measure,
measure -k Saikuro -l some_folder
and wrote some glue code to bring all these things together,
measure_ruby -t JavaClass -d E:\Develop\Ruby\JavaClass\lib

Complexity Slope of JavaClass (Ruby)
Note that the scripts are only available for Windows, but the Rake task and Java code work on all platforms. Also note that Saikuro messes up RDOC, so both cannot be in the same Rake file.

11 March 2013

Moved to Slideshare

Green WhiteboardAfter watching a presentation about SlideShare by Luis Suarez, the "E-mail-less Man", I decided to give it a try. I brushed up some of my past presentations and uploaded them to SlideShare. New slides will be added occasionally, so stay in touch. Thank you Luis for motivating me ;-)