Background
I use several social applications like Facebook or Twitter. Recently I needed to get started with LinkedIn to connect with some people that were not using any other services like Xing or Google+. I registered for LinkedIn and got connected. Of course some of my friends also had a profile on LinkedIn and I started to look for them. I had more than 300 contacts spread among all major platforms with a certain degree of duplication, so working through my contacts on each platform was cumbersome. I needed a consolidated, unified and complete list of all my contacts to search for them one after another.
Automation Automation Automation
I am a developer so my first choice was to create some kind of script. (A script is just a tiny application but the word "script" implies a rougher state, something less polished and sort of unfinished.) I did not bother to look for tools that consolidate social media contacts. I am sure there are tools available that collect contacts from Facebook and LinkedIn, but there are so many platforms, these tools will never be exhaustive. I started with Twitter's REST API which worked great. But not all platforms offered such a API and some of these APIs seemed overly complex to me (read OAuth). I just wanted a quick (and dirty ;-) way to collect my contacts, not a complete, JSON and XML consuming monster application and I dropped the idea. I needed a different approach.
Selenium to the Rescue
Using the browser I was able to display and navigate my contacts on each platform with a few mouse clicks. So I decided to automate my browser. I chose Selenium because it is a powerful tool built to test web applications. Getting started with Java and Selenium WebDriver was easy and I quickly created a prototype for Twitter. It executed the following steps:
public void collectNames() { openFirstPage(); iteratePages(); }It opened Twitter's following page, https://twitter.com/following, waited for the browser to finish loading
protected void openFirstPage() { String friendsPage = getFriendsPageUrl(); driver.get(friendsPage); Thread.sleep(WAIT_MS); }and collected the names of my Twitter friends by extracting the text formatted with a certain CSS class. (CSS is an annotation that is used in web development to style the appearance of text in the browser. Selenium is able to select elements of a web page based on their style.)
private void iteratePages() { boolean hasNextPage = true; while (hasNextPage) { scrapFriendNames(); hasNextPage = openNextPage(); } } protected void scrapFriendNames() { By cssSelector = By.cssSelector(getFriendsSelector()); List<WebElement> nameElements = driver.findElements(cssSelector); for (WebElement e : nameElements) { addName(e.getText()); } }Extending to Other Platforms
The only difference between platforms were the URLs of the contacts pages (
getFriendsPageUrl()
) and the (CSS) styles of the names of the contacts displayed (getFriendsSelector()
), e.g. Google+ used "div.MN.MQ.abJ"
where Facebook used ".fsl.fwb.fcb"
. As long as a social network displayed my contacts as a list of one or more pages, the code could handle it. This approach worked for Facebook, GitHub, Goodreads, Google+, LinkedIn, Twitter, Vimeo, Xing and even IBM Connections.Limits of Automation
I did not touch authentication, but relied on the cookies stored in my browser to be used. Selenium uses its own browser profile, so I had to point its location with my own one.
String PROFILE = "C:\\Users\\codecop\\AppData\\Roaming\\Mozilla\\Firefox\\Profiles\\8izc0dg4.default"; private void initDriver() { webDriver = new FirefoxDriver(new FirefoxProfile(new File(PROFILE))); }To use this approach I had to make sure I had logged in all of the social platforms recently and accepted their cookies. Afterwards, when Selenium used Firefox to browse to Facebook, it would send these cookies and receive my contacts page immediately.
Lessons Learned
Let me repeat the key points from this experience:
- Automation might already be worth for one time, repetitive tasks.
- Browser automation tools like Selenium or Watir are useful not only for testing web applications but also for automating any interaction with web sites.
- When automating web interaction, keep it simple. The simpler the criteria for navigating and extracting content is, the less likely it is to break when small things change on the web.
- Start with a rough prototype for some of the actions you want to automate. A little bit of automation is better than none. You can always come back and continue if needed.
- Do not push the automation too far. Some things are easier to automate than others. Manual preparation up front is an option.
No comments:
Post a Comment