Data roundup, January 30

January 30, 2013 in Data Roundup

We’re rounding up data news from the web each week. If you have a data news tip, send it to us at [email protected].

Photo credit: USAID/Alex Kamwera

Photo credit: USAID/Alex Kamwera


Scott Murray, author of the forthcoming O’Reilly D3.js book, has written “Data-Driven Documents, Defined“, a D3.js tutorial aimed at journalists. Murray’s tutorial introduces basic D3 concepts by leading you through the creation a simple dot chart.

In R news, a new CRAN Task View on “Handling and Analyzing Spatio-Temporal Data” was released this week. It provides extensive guidance on the representation, analysis, and visualization of data that includes both location and time in R. Version 1.0 of the R Devtools has also been released, promising to simplify the process of developing, testing, and distributing R packages.

If you’ve ever wanted a tool like Make specialized to help manage your data projects, check out Drake, a new command line tool described by its creators as a “Make for data”. Drake happens to be written in my favorite JVM programming language, Clojure.

Syracuse University’s iSchool is offering a free online course in data science. This course will provide an introduction to data analysis and visualization with the R programming language. Registration is limited to the first 500 students, so sign up soon.

Data journalist Simon Rogers has a new blog, and he has wasted no time posting lots of goodies there. These include his collection of 22 key data journalism links, comprising many handy tools for making visualizations, maps, and timelines.

Can you predict the auction price of a bulldozer? Prove it and win $10,000. Fast Iron is holding a contest to predict the sale price of a piece of heavy equipment from data giving its properties. Submissions are due April 10.


The Code4Kenya project, a five-month pilot project aiming “to create a sustainable ecosystem around the Kenya Open Data Initiative”, launched its first batch of newsroom-based data products on January 24. These four portals present data on the 2013 election, urban crime, education, and health in Kenya.

The process of telling a story with data was given a number of walkthroughs in blog posts this week. One post explains how to use MapBox.js to create a live visualization of U.S. drone strikes. Another details the creation of an “armchair politologist” app for the Czech presidential election. Yet another explains how the BBC uses NLP and linked data to create news content.

A recent Le Monde blog post by Emilienne Malfatto and Jelena Ptorić provides an object lesson in data storytelling. Their post articulates the problem of the high suicide rate in French prisons through effective maps, charts, and visualizations, showing that prison overpopulation is not correlated with suicide rate.

The Berlin International Airport debacle has been illustrated in an interactive photographic timeline by Julius Tröger. Watch the airport’s farcical series of delays unfold before your very eyes, with helpful caption text provided by Tröger.

Twitter has published the second installment of its biannual transparency report, exploring six months of data on government interference and DMCA takedown requests, on the new The new site is intended to make transparency data “more meaningful and accessible to the community at large”.

Prepare to lose many hours of your life to the new dedicated graphics page launched by Canada’s National Post. The page is a staggering compilation of the Post’s visual journalism and data storytelling.

Explore WNYC’s delightful map of New York City’s dogs. “Many of us have tried a dog-related data viz project,” tweets Toronto data journalist Patrick Cain; “WNYC’s is superb.”


Simon Rogers, on a roll with his new blog, has released his collection of 16 border files for use in Google Fusion maps.

A dataset on all liquor license holders in British Columbia has been released by DataBC. Data on BC liquor sales previously released by DataBC served as the basis of a recent map and article published by Global News.

An open repository of web data is in the works at a new nonprofit called Common Crawl. Common Crawl’s crawled archive of the web so far spans 6 billion web pages plus their metadata (some 81 terabytes of data), available for bulk download through Amazon’s Public Data Sets.

“Bulk access to [U.S.] government information including high-resolution videos, IRS nonprofit tax returns, and older archives of other government databases” is now available from

A nice collection of open data resources for Europe and the Netherlands has been posted by Matteo Manferdini of PureCreek, gathered from his Appsterdam lecture “Who’s Who of Open Data”.

Flattr this!