Data roundup, April 3

April 3, 2013 in Data Roundup

We’re rounding up data news from the web each week. If you have a data news tip, send it to us at [email protected].

Photo credit: Julian Dobson

Photo credit: Julian Dobson


The Open Knowledge Foundation and P2PU are partnering up to offer a data explorer mission MOOC. In this 4-week course which will run from April 15 to May 3, you can band together with others to learn the basics of digging into data and telling a story.

The Source continues to roll out great OpenNews Learning materials, including this week’s piece by Jacob Harris, “How the Data Sausage Gets Made”. This piece delves into the process of collecting, cleaning, and “interviewing” data to answer questions about food safety in the US.

Beautiful Trouble is “a book and web toolbox that puts the accumulated wisdom of decades of creative protest into the hands of the next generation of change-makers”. The book can be navigated by means of an innovative interactive interface.

As reported here a couple of weeks ago, the Iberoamerican Data Journalism Handbook is in development. A Brazilian Portuguese translation of the Data Journalism Handbook is now also underway, thanks to a team of “over 30 Brazilian journalists and students”.

Vega is a “visualization grammar” built on top of D3.js, allowing you to create data visualizations declaratively editing a JSON file. Whereas D3.js is a low-level kernel for data visualization, Vega is a high-level language which abstracts away from the gritty programmatic details.

In a similar spirit, DexCharts is a library of reusable charts for D3.js. It will come in handy for anyone who has ever wanted to not have to reinvent the wheel just to make a bar chart using D3.

healthvis is a new R package that will allow you to produce D3-based visualizations as easily as normal R plots. Visualizations created with healthvis are hosted online on a Google App Engine server.

Slides and code from Lynn Cherny’s PyData 2013 talk on using Nodebox OpenGL for data visualization are available. Nodebox gives you Processing-like graphics capabilities in a Pythonic environment.


The ongoing electricity crisis in Cameroon subjects Cameroonians to power outages lasting days or weeks. Journalism++ visualizes the unfolding of the crisis from August 2012 to present using CartoDB on the basis of (incomplete!) data from “all complaints received by AES SONEL on its toll-free hotline”. The severity of the crisis and unavailability of open data motivates the Feowl crowdsourcing platform, “a community-driven platform that produces accurate and actionable public data on the electricity supply in Douala”.

How many people showed support for marriage equality on Facebook from March 25 onward? Facebook investigates the question indirectly by looking at profile picture changes, in hopes of catching responses to the HRC‘s suggestion that marriage equality supporters change their profile.

Stamen has produced jaw-dropping 3D maps of San Francisco, New York, London, and Berlin on the basis of Nokia HERE data. They must be seen to be believed. The rationale for the project is given in a recent blog post.

Periscopic, everyone’s favorite socially conscious dataviz group, have created an interactive to answer the question: “How old were they?” “They”, here, refers to artists—film directors, musicians, and novelists—whose output across the years can be explored individually and in aggregate form.

How accurate are claims that 900,000 people in the US dropped benefit claims “rather than undergo a tough new medical test”? Help Me Investigate Welfare reports on an investigation and republishes its data.

How are Texans raised—literally? The Texas Tribune answers this question by presenting data from 48,242 elevators, including an interactive graphic giving the distribution over age and number of floors.


A complete index of the Guardian’s datasets has been released. If you’ve been wanting to take your own shot at some of the Guardian’s exemplary data reporting, now is your chance.

A new blog post by Samvel Martirosyan gathers together a variety of useful online sources of data on Armenia. These include not only state departments and ministries but also a variety of lesser known sources.

The Zimbabwe Stock Exchange has launched a data portal, facilitating more transparent communication between the ZSE and its stakeholders.

DataMob has released a collection of social datasets for data mining experiments. These include datasets from Reddit, Facebook, and the Pirate Bay (not to mention my alma mater Cornell University).

The Yelp academic dataset is now available. This comprises “all the data and reviews of the 250 closest businesses for 30 universities”, distributed as a single zip file.

The US government’s Consumer Complaint Database has been updated. The update increases the number of complaints “from about 19,000 to more than 90,000”.

Microsoft has published a searchable list of its 40,000+ patents. The list is available as a CSV download, in addition to being accessible via their Patent Tracker.

California has launched a new Geoportal to improve access to geographic data. The new version “touts socially-derived maps”.

Flattr this!