8 May 2013

XSLTunit Ant Support

some antsLast year I did the Prime Factors Kata in XSLT. Using XSLTunit I created a test case that applied the template to a number to calculate its prime factors. I focused on the coding problem and ignored the testing infrastructure, calling the XSLT processor from the command line and looking into the generated XML test result to see if any assert had failed. When I felt like playing with XSLT again, I first had to fix the infrastructure and make it ready for Continuous Integration.

Ant Support
CI tools like Jenkins are able to call any script, but my first idea for build automation is always Ant. I have some history with Ant but most likely I use it because I am an old school Java developer ;-). Fortunately Ant has build-in XSLT support which I can use to apply the XSLTunit template.
<property name="test_prefix" value="tst_" />
<property name="test_result_suffix" value=".test_result.xml" />

<!-- apply style to XMLs in suite -->
<xslt basedir="${suite}" destdir="${suite}" style="${suite}/${style}">
  <include name="*.xml" />
  <exclude name="*${test_result_suffix}" />
</xslt>

<!-- apply test -->
<xslt basedir="${suite}" destdir="${suite}" style="${suite}/${test_prefix}${style}"
      extension="${test_result_suffix}">
  <include name="*.xsl" />
  <exclude name="${test_prefix}*.xsl" />
</xslt>

<!-- create readable HTML test report -->
<xslt basedir="${suite}" destdir="${suite}" style="${basedir}/lib/xsltunit_report.xsl">
  <include name="*${test_result_suffix}" />
  <param name="testname" expression="${style}" />
</xslt>
My approach contains a lot of "convention over configuration". For example I assume that each template ${style} is located inside its own folder ${suite}. The given Ant target first transforms sample data with the template under test for manual review, then applies the test case to the template and finally generates a readable report out of the test result. The first and last step is optional but useful during development. In the end the test result is checked for the text "failed" which shows a failed assertion.
<dirname property="suite" file="${testResult}" />
<loadfile property="test_failed" srcFile="${testResult}" />
<fail message="Test ${suite} failed!">
  <condition>
    <contains string="${test_failed}"
              substring="outcome=&quot;failed&quot;" />
  </condition>
</fail>
Testing and checking the result is done for all templates in the source folder ${src} by using <foreach> from Ant-Contrib.
<foreach target="-testFolder" param="foreach">
  <path>
    <fileset dir="${src}">
      <include name="**/*.xsl" />
      <exclude name="**/${test_prefix}*.xsl" />
    </fileset>
  </path>
</foreach>

<foreach target="-verifyResult" param="foreach">
  <path>
    <fileset dir="${src}">
      <include name="**/*${test_result_suffix}" />
    </fileset>
  </path>
</foreach>
The complete build.xml is here.

FizzBuzz Kata
After the unit tests were executed by Continuous Integration for each commit, I went for some XSLT exercise. Unfortunately adding the Ant support had taken too much of my scheduled learning time and I had to work with the smallest kata available, the FizzBuzz question: Write a program that prints the numbers from 1 to 100. But for multiples of three print "Fizz" instead of the number and for the multiples of five print "Buzz". For numbers which are multiples of both three and five print "FizzBuzz". I wrote the following template by following a few XSLTunit tests.
<xsl:template match="value" mode="fizzbuzz">
  <xsl:choose>
    <xsl:when test="number(current()) mod 15 = 0">FizzBuzz</xsl:when>
    <xsl:when test="number(current()) mod 3 = 0">Fizz</xsl:when>
    <xsl:when test="number(current()) mod 5 = 0">Buzz</xsl:when>
    <xsl:otherwise><xsl:value-of select="." /></xsl:otherwise>
  </xsl:choose>
</xsl:template>
To solve the complete question I applied the template to the values of 1 till 100,
<numbers>
  <number><value>1</value></number>
  <number><value>2</value></number>
  <number><value>3</value></number>
  <number><value>4</value></number>
  ...
  <number><value>100</value></number>
</numbers>
which generated a HTML file with the answer of the FizzBuzz question.
  • for 1 say 1
  • for 2 say 2
  • for 3 say Fizz
  • for 4 say 4
  • ...
  • for 100 say Buzz
(Source at Bitbucket.)

4 May 2013

Unifying Social-Media Contacts

Contacts cubedA group of colleagues is collecting stories about successful automation and asked me to share some of my experiences to help raise awareness of the topic. Here is one of my stories. (It targets a broader audience than my usual, software development related posts, so it is more verbose in explaining things.)

Background
I use several social applications like Facebook or Twitter. Recently I needed to get started with LinkedIn to connect with some people that were not using any other services like Xing or Google+. I registered for LinkedIn and got connected. Of course some of my friends also had a profile on LinkedIn and I started to look for them. I had more than 300 contacts spread among all major platforms with a certain degree of duplication, so working through my contacts on each platform was cumbersome. I needed a consolidated, unified and complete list of all my contacts to search for them one after another.

Automation Automation Automation
I am a developer so my first choice was to create some kind of script. (A script is just a tiny application but the word "script" implies a rougher state, something less polished and sort of unfinished.) I did not bother to look for tools that consolidate social media contacts. I am sure there are tools available that collect contacts from Facebook and LinkedIn, but there are so many platforms, these tools will never be exhaustive. I started with Twitter's REST API which worked great. But not all platforms offered such a API and some of these APIs seemed overly complex to me (read OAuth). I just wanted a quick (and dirty ;-) way to collect my contacts, not a complete, JSON and XML consuming monster application and I dropped the idea. I needed a different approach.

Selenium to the Rescue
Using the browser I was able to display and navigate my contacts on each platform with a few mouse clicks. So I decided to automate my browser. I chose Selenium because it is a powerful tool built to test web applications. Getting started with Java and Selenium WebDriver was easy and I quickly created a prototype for Twitter. It executed the following steps:
public void collectNames() {
   openFirstPage();
   iteratePages();
}
It opened Twitter's following page, https://twitter.com/following, waited for the browser to finish loading
protected void openFirstPage() {
   String friendsPage = getFriendsPageUrl();
   driver.get(friendsPage);
   Thread.sleep(WAIT_MS);
}
and collected the names of my Twitter friends by extracting the text formatted with a certain CSS class. (CSS is an annotation that is used in web development to style the appearance of text in the browser. Selenium is able to select elements of a web page based on their style.)
private void iteratePages() {
   boolean hasNextPage = true;
   while (hasNextPage) {
      scrapFriendNames();
      hasNextPage = openNextPage();
   }
}
protected void scrapFriendNames() {
   By cssSelector = By.cssSelector(getFriendsSelector());
   List<WebElement> nameElements = driver.findElements(cssSelector);
   for (WebElement e : nameElements) {
      addName(e.getText());
   }
}
Extending to Other Platforms
The only difference between platforms were the URLs of the contacts pages (getFriendsPageUrl()) and the (CSS) styles of the names of the contacts displayed (getFriendsSelector()), e.g. Google+ used "div.MN.MQ.abJ" where Facebook used ".fsl.fwb.fcb". As long as a social network displayed my contacts as a list of one or more pages, the code could handle it. This approach worked for Facebook, GitHub, Goodreads, Google+, LinkedIn, Twitter, Vimeo, Xing and even IBM Connections.

Limits of Automation
I did not touch authentication, but relied on the cookies stored in my browser to be used. Selenium uses its own browser profile, so I had to point its location with my own one.
String PROFILE =
   "C:\\Users\\codecop\\AppData\\Roaming\\Mozilla\\Firefox\\Profiles\\8izc0dg4.default";

private void initDriver() {
   webDriver = new FirefoxDriver(new FirefoxProfile(new File(PROFILE)));
}
To use this approach I had to make sure I had logged in all of the social platforms recently and accepted their cookies. Afterwards, when Selenium used Firefox to browse to Facebook, it would send these cookies and receive my contacts page immediately.

Lessons Learned
Let me repeat the key points from this experience:
Automatic
  • Automation might already be worth for one time, repetitive tasks.
  • Browser automation tools like Selenium or Watir are useful not only for testing web applications but also for automating any interaction with web sites.
  • When automating web interaction, keep it simple. The simpler the criteria for navigating and extracting content is, the less likely it is to break when small things change on the web.
  • Start with a rough prototype for some of the actions you want to automate. A little bit of automation is better than none. You can always come back and continue if needed.
  • Do not push the automation too far. Some things are easier to automate than others. Manual preparation up front is an option.

22 March 2013

Twitter Command Line Backup

I used to be a low volume Twitter user. I would not connect with people having more than 1000 tweets because they seemed to "high-volume" for me. But given time the number of my tweets rose as well and I managed to create a decent tweet now and then.

U.S.-Bundeswehr medical trainingSaving My Tweets
As soon as I noticed that my tweets were "piling up", I thought about backing them up. This would have to work incrementally as older tweets might vanish from my time line. I searched for an online tool, but did not find anything useful. While searching for offline tools, I found TwitterBackup by Johann Burkard. It is a great tool and does exactly what I need, incremental backup of tweets. It has a small user interface and works out of the box. I recommend it if you like using UIs. (Note that the password input field in the main window is not used, you do not have to provide your password there.)

GUIs are for wimps ;-)
I prefer to start my tools from the command line, especially if I plan to run them periodically. Fortunately Johann Burkard provided his TwitterBackup under a MIT license, so I forked the source and "mavenized" the project. Johann had kept his logic separated from the UI which made it easy to remove the user interface and add command line parameters instead. A few days later I added support to backup favorites and retweets as well. The current version supports the following commands:
E:\>java -jar twitter-backup-cli-3.1.8.2-jar-with-dependencies.jar -h
usage: twitter-backup-cli [-u <twitter handle>] [-f <backup file>]
Backup Twitter Tweets with TwitterBackup (command line).
 -f,--file <arg>         File to save tweets to (saved to system)
 -fv,--favorites         Load favorites instead of tweets (default=tweets)
 -h,--help               Print this usage information
 -o,--port <arg>         HTTP proxy port for web access
 -p,--proxy <arg>        HTTP proxy URL for web access (saved to system)
 -r,--reset-preferences  Do not load preferences from system
 -rt,--retweets          Also load retweets in timeline (default=false)
 -si,--sinceId <arg>     Load tweets/favorites since the given id (default=all)
 -t,--timeout <arg>      Timeout in ms between calls to Twitter (default=10500)
 -u,--username <arg>     Twitter username to load tweets from (saved to system)
For general instructions, see Johann's website at: http://is.gd/4ete
Note that both the original and the new version save some of their parameters to the java.util.prefs.Preferences, so you need to provide your credentials only once.

Usage Patterns
Download the binary Jar and provide your twitter handle, e.g. -u codecopkofler. Use -f tweets.xml -rt to backup all tweets and retweets and use -f favorites.xml -fv to save all your favorites. Finally decide where the backup should start, e.g. -si 286077755567779841. The value 286077755567779841 is Twitter's id of my first tweet in the year 2013, whereas 152504278093795328 was my first in 2012 and so on. With -si I separate my tweets into yearly backup chunks.
Loading Tweets
read 27 tweets from tweets.xml
loading http://api.twitter.com/1/statuses/user_timeline.xml?...
9 new tweets downloaded, 36 total
waiting 10500ms
loading http://api.twitter.com/1/statuses/user_timeline.xml?...
no new tweets found
saving backup to tweets.xml
Loading Favorites
read 0 tweets from favorites.xml
loading http://api.twitter.com/1/favorites.xml?...
1 new tweets downloaded, 1 total
waiting 10500ms
loading http://api.twitter.com/1/favorites.xml?...
no new tweets found
saving backup to favorites.xml