Blogging a race with R

I’ve been blogging extensively about this year’s World Solar Challenge. My main tool for doing so has been R. Having put together a database of teams and team data, a set of R scripts generated web pages (like this one and that one) from the database. Using R made it easy to incorporate graphs and analyses of team data within those pages. For example:


Also useful were R scripts to extract additional structured data from the World Solar Challenge web site using XML parsing (with the RCurl and XML libraries), R scripts to scan Twitter feeds of race teams (interfacing to a Python script which did the actual downloading, because of weaknesses in the R interface to Twitter), and R scripts to generate various maps (primarily using the raster package). Examples of such maps include this temperature map of Australia in October:

An R script for parsing data from was used to produce (and regularly update) this calendar:

Additional R scripts were used to generate a number of infographics, such as these:


During the race itself, serious data quality problems presented themselves. Official timing data contained multiple errors, while GPS tracking data suffered from time lags greater than the gaps between teams. This created a need for code to do data sanity-checking, to do data cleaning, and to do car position extrapolation. R was very useful for writing such tools on the fly. The map below shows raw GPS data for car positions (overlayed on a NASA raster image), and was produced using code written during the race:

The chart below summarises official timing data, and was produced using code modified from that used to report on the 2013 race:

This chart of official Cruiser class results was also produced using code modified from that used in 2013:

New code was used to produce this chart of Cruiser class cars that partially trailered:

Many other charts and web pages were produced during race coverage. In each case, R provided useful facilities for acquiring, visualising, and organising data. Generating HTML from R scripts and a database also proved very successful. In hindsight, virtually all blog posts should have been generated this way.

Finally, a few small touches of humour do not go astray. Putting together this image, for example, was quite popular:

WSC Results (3)

Following up on the World Solar Challenge official results, here are the six Cruisers that trailered, plotted by the two most meaningful numbers for such cars – person-kilometres and practicality. The lovely little car from Lodz is clearly the leader of this particular pack and thus, in my opinion, sixth Cruiser overall.

I should also point out this great WSC dataviz by Tiffany Hu, and the superb WSC summary and retrospective by MostDece.