Data roundup, March 6

March 6, 2013 in Data Roundup

We’re rounding up data news from the web each week. If you have a data news tip, send it to us at [email protected]

Photo credit: Daniel Amarai

Photo credit: Daniel Amarai


There were many conferences for data journalists and data storytellers this past week. The biggest of these was the National Institute for Computer-Assisted Reporting’s 2013 conference. Posts about NICAR 2013 sessions are up on the conference’s blog. You can check out Chrys Wu’s collection of slides, tutorials, and tools from the event, as well as the slides and code from Hadley Wickham’s day-long R course.

The first annual Tapestry Conference, a day-long invitation-only conference “designed to advance interactive online storytelling with data”, was also held this past week. The slides for Jonathan Corum’s opening keynote address are available online. Ben Jones summarizes some of the discussion that took place there.

The humanitarian use of big data was the subject of Patrick Meier’s keynote address at the Social Media and Response Management Interface Event. A curated list of every project and concept referred to in his talk has been posted.

An intensive one-day introduction to open data for journalists from the OKF’s Lisa Evans and Kathryn Corrick from the Open Data Institute has just opened for registration. The course is being co-run with and is due to be held in London on April 23, will teach journalists what and where open data is and how to use it to tell stories.

A new guide has been posted to the ProPublica Nerd Guides, showing you how to bulletproof your data. Noah Veltman has also written a blog post on how not to screw up your data. Both of these teach you the tricks of the process of verifying your data’s integrity and making it ready for publication.

Beta version 2.0 of CKAN, the Open Knowledge Foundation’s open source data portal software, has arrived. New features include a simplified authorization model, followable activity streams, and a switch to Jinja2 templates.

Machine learning produces models which are effective but difficult to understand. MLDemos, “an open-source visualization tool for machine learning algorithms”, may help solve that problem by making the effects of parameters on machine learning algorithms visible.

A new blog post on visualizing neural networks made with the R package nnet also addresses this problem. The post shows how to draw a “neural interpretation diagram” to illuminate some of the structure of trained neural nets.

Dedupe is a Python library that efficiently finds and eliminates duplicate entries in large sets of “real-world” data. How does it work? The Dedupe wiki explains in detail.


March 4 was election day in Kenya. Anahi Ayala Iacucci critically reviews the large number of “branded, advertised, public electoral monitoring system[s]” produced in anticipation of the election.

Connected China is a monumental new Reuters project visualizing political organization and power networks in China. Connected China breaks down both social and institutional power structures and maps the path to political power.

Open Data Day will be remembered as the beginning of the Nepalese open data movement. A blog post by reflects on the event and points to work by YoungInnovations and OpenDRI.

OKF France has launched Open Transition Energie, fostering the national debate on the French transition to renewable energy by providing a place to share and visualize data. Besides the websites, Open Transition Energie is also a thematic group on the NosDonné data portal. presents visualizations and analysis of the use of public money in Moldova. Its visualization of the 2013 Moldovan budget makes use of tools from the OKFN’s OpenSpending project. has put together a (German-language) roundup of February’s data journalism highlights. These include the AG Panda Projekt and many things that will look familiar to data roundup readers.

Density in geospatial data can be visualized with a technique borrowed from maps of mountains. Chris Herwig shows how this method gives insight into two racially charged datasets: the NYCPD’s database of “stop-and-frisk encounters” and US locations with racist names.


Derek Watkins has created a D3.js-based interface “to ease the pain of downloading elevation data from the Shuttle Radar Topography Mission”, crucial data for mapping mountainous terrain. Click a tile to download a 6000×6000 GeoTIFF file, and then follow Watkins’s links to Development Seed and Linfiniti to learn what to do with them.

Japan’s government has announced its plans to launch a national data portal, continuing the country’s move toward data transparency. Details concerning the date of the launch are unfortunately not yet available.

The team at BigML has shared their long list of open data sources. This includes machine learning datasets, data marketplaces, government open data sources, and more.

The Sri Lanka Disaster Risk Information Platform is a new data portal built on GeoNode by the Disaster Management Center. Users can both download maps and data and contribute their own.

Flattr this!