Data roundup, July 10

July 10, 2013 in Data Roundup

We’re rounding up data news from the web each week. If you have a data news tip, send it to us at [email protected].

Photo credit: Zeinab Mohamed

Photo credit: Zeinab Mohamed


You have four more days to apply for the OKCon 2013 travel bursaries. These grants support travel to Geneva to participate in the Open Knowledge Conference that will be taking place from September 16 – 18, covering transport, accommodation, and conference lunches.

If you missed the SciPy 2013 conference and its awesome tutorials, don’t worry. It’s okay. There are plenty of videos of the SciPy tutorials available to watch on the conference website.

Remember how the UK Land Registry announced it would be releasing new open data sets a little while ago? Now they’re marking the release of that data by opening the Open Data Challenge, a contest that will give three £3,000 awards to developers who show how UKLR data can make a positive impact on the UK economy.

The ScraperWiki platform has come out of beta. ScraperWiki is a handy web service for “liberating data from silos and empowering you to do what you want with it”. Wondering what’s new about it? Check out the FAQ, not to mention the blog post announcing the FAQ with a neat visualization of #scraperwiki Twitter activity.

Last week you learned about the Poderopedia platform, a system for tracing networks of political and corporate power. Learn more about Poderopedia from a detailed discussion from creator Miguel Paz at Mozilla Source.

Friedrich Lindenberg discusses applications of text mining in investigative journalism, providing an overview of useful tools and techniques for crunching large collections of documents to unveil hidden insights.

Tuanis is a free tool created by Matthew Caruana Galizia to automate the construction of choropleth maps in the newsroom. Loading data into Tuanis is as simple as creating a Google spreadsheet and publishing it to the web. The project is explained in a post on the author’s blog.

Mise à journalisme has published a thorough review of the 20 best data visualization tools for use in the newsroom. Excluded from the list are apps that are unfree, ugly, or written in Flash—thank goodness. The remaining list contains something for every level of experience.


“Can Twitter provide early warning signals of growing political tension in Egypt and elsewhere?” Patrick Meier and colleagues have analyzed some 17 million Egyptian tweets and developed “a Political Polarization Index that provides early warning signals for increased social tensions and violence”. Their striking finding is that outbreaks of violence correspond to periods of high polarization.

“We might be able to do better at conflict resolution,” says researcher Jonathan Stray, “with the help of good data analysis.” Watch Stray’s IPSI Symposium talk on data in conflict resolution, and follow up by watching the talk by Erica Chenoweth, “Why Civil Resistance Works”, that he cites as exemplary.

UNHCR, the UN Refugee Agency, has produced an interactive map of historical refugee data, visualizing changes in the world’s refugee communities over the past five decades.

It’s been said that we’ve lately “had to face the awful conclusion that the Internet itself is one giant automated Stasi”. But how does the NSA’s data collection actually compare to that of the Stasi? See for yourself: this helpful Stasi vs. NSA map compares the size of the Stasi filing system with that which would be necessary to store the NSA’s data.

Watch_Dogs is a video game in which “the City of Chicago is run by a Central Operating System [that uses] data to manage the entire city and solve complex problems”. As the creators of the game note, this is not really a science fiction scenario. The game’s promotional website WeareData illustrates the extent of publicly available data on Paris, London, and Berlin in a fairly sinister manner.

You can do all sorts of crazy stuff with D3.js. You can, for example, use it to brute-force puzzles, as Ben Best explains at length. Learn how to find buried treasure with data-driven graphics and some simple mathematical reasoning.

DATA SOURCES is a daily data feed for the US Treasury, “the first-ever electronically-searchable database of the Federal government’s daily cash spending and borrowing”, updated daily and lovingly documented with a “data dictionary” explaining the structure and meaning of the hosted data.

The International Conference on Weblogs and Social Media now provides “a hosting service for new datasets used by papers published in the proceedings of the annual ICWSM conference”. These include datasets for research on sentiment extraction, social network analysis, and more.

The “Nonviolent and Violent Campaigns and Outcomes” dataset is “a multi-level data collection effort that catalogues major nonviolent and violent resistance campaigns around the globe from 1900-2011”. It has been described as “invaluable for understanding non-violence”.

Later this summer, the US National Atlas program will release “Natural Earth relief/land cover data […] intended as background bases for general purpose mapmaking”. Samples of the forthcoming data are available for download.

Flattr this!