Data roundup, April 24

April 24, 2013 in Data Roundup

We’re rounding up data news from the web each week. If you have a data news tip, send it to us at [email protected].

Photo credit: Jason Kuffer

Photo credit: Jason Kuffer


The annual Perugia International Journalism Festival begins today, and the School of Data has helped organize a School of Data Journalism for the festival. Click through to see the School’s panels and workshops, and follow #DDJSchool on Twitter to see discussions of the events. has been launched. Crowdcrafting is a platform built on PyBossa crowdsourcing technology which makes it easy to recruit participants in community science projects en masse.

What are the three elements of successful data visualizations? According to Jim Stikeleather’s post on the Harvard Business Review blog, they reflect understanding of the audience, they set up a clear framework, and they tell a story.

But what does “data storytelling” mean, anyway? Many people have been asking that question lately. Zach Gemignani has put together a comprehensive collection of data storytelling resources which address that question and many, many more.

Marcelo Träsel is crowdsourcing a database of data journalism websites, and he needs your help. Contribute your links and help fix “the absence of a comprehensive database or other resource listing websites and weblogs on visualization, investigative techniques, CAR, and all other newsrooms practices labelled as data journalism”.

Ciara Byrne “signed up to spend a month of immersion in data, hoping to emerge a newly minted data scientist”, and her lessons from a crash course in data science tell the tale of what she learned.

Are you, like many programmers, too lazy to use relational databases for your day-to-day work with small-to-medium datasets? dataset is a Python library that aims to change that. Its motto: “Because managing databases in Python should be as simple as reading and writing JSON files.”

If you like Python and have been enjoying the new ease of creating data-driven graphics with Vega, then you’ll love being able to combine the two using Vincent, a Python library that “takes Python data structures (tuples, lists, dicts, and Pandas DataFrames) and translates them into Vega visualization grammar”.

The PLOS Text Mining Collection has gone live. This collection brings together reviews and research published in PLOS journals on the subject of text mining, the art of the retrieval and analysis of information from unstructured text. This initial launch only covers the past two years of publications but will grow over time.

Lisa Williams of Data For Radicals will be holding a data visualization workshop for beginners in Detroit on June 20 – 23. Participants in the workshop will get a personalized introduction to “maps, charts, graphs, and data visualizations”.


The Boston bombings have generated a large number of interactives and visualizations. Visual Loop has collected some of the better ones in the 32nd edition of Interactive Inspiration.

What, meanwhile, do we know about Dzhokhar Tsarnaev from his social media use? Quartz reflects on the way “we reveal immense amounts of information about ourselves publicly, unthinkingly, and sometimes involuntarily”.

If you haven’t been following Données Fleuries, Nicolas Patte’s weekly (French-language) review of excellence in data visualization, this week’s edition is a good time to start. DF #11 touches on the use of Raphaël.js, Twitter cartograms, and more.

New York City’s subways are an interesting window into wealth inequality in the city. The New Yorker has produced an interactive infographic showing how each subway line winds its way through the peaks and valleys of NYC’s wealth; Noah Veltman has produced a neat visual variant on the New Yorker map. These maps will remind many of’s map, which illustrated the impact of fare hikes on different NYC neighborhoods.

Moritz Stefaner takes a close look at gender balance in data visualization conferences. There is, he concludes, still a ways to go before a real balance is achieved.

Canada’s Global News has finally processed 318 PDFs of census survey responses from Toronto students and is producing an ongoing series of works based on the data. One recent piece asks: how safe do Toronto students feel?


The Philippine Agriculture Department has opened a new data portal, the Department of Agriculture Accountability Network (DAAN). DAAN aims to promote public awareness of the DA’s projects and to increase transparency with respect to its funding and other relevant data.

BISON (ostensibly standing for “Biodiversity Information Serving Our Nation”) is a new portal giving access to United States species occurrence data, tracking more than 100 million species in its datasets.

In time for Earth Day, the federal government of Canada and provincial government of Alberta have launched a new portal for environmental data from Alberta’s oilsands. The portal is intended to address criticisms of the government’s secrecy about the environmental impact of the oilsands.

The source code for, the data portal for the Spanish government’s Catalogue of Public Information, has been made available in an open form. The portal architecture can now be freely reused for new open data projects.

Flattr this!