Data roundup, July 3

July 3, 2013 in Data Roundup

We’re rounding up data news from the web each week. If you have a data news tip, send it to us at [email protected].

Photo credit: David O'Leary

Photo credit: David O’Leary


The 12th Python in Science conference, #scipy2013, just concluded, and the conference proceedings are now available. How was this superfast turnaround time possible? “For 2013 [the reviewers] followed a very lightweight review process, via comments on GitHub pull-requests.” Hopefully this remarkable publication method will achieve broader currency. If that’s not enough SciPy content for you this week, also check out Brad Chapman’s notes on day one and day two of the conference.

Escuela de Datos has launched. This new project from the School of Data is an example of the School’s efforts “to bring the School of Data methodologies and materials to people in their native languages”, transporting the School’s hands-on teaching approach to the Spanish-speaking world. OKF International Community Manager Zara Rahman reflects on meeting the Latin American open knowledge community.

Abre Latam, “the first unconference on open data and transparency in Latin American governments”, took place in Montevideo, Uruguay, on June 24th and 25th. Learn about what happened at Abre Latam in a La Nación blog post.

Poderopedia is a “data journalism website that uses public data, semantic web technology, and network visualizations to map who’s who in business and politics in Chile”. It is now also a platform. New sites on the Poderopedia model can now be created by forking the Poderopedia GitHub repository.

Open Knowledge Foundation Nepal’s first meetup took place on the 28th of June. The meetup was an informal discussion of the OKFN’s nature and purpose, setting the agenda for future activities. Prakash Neupane provides a summary of the event.

RecordBreaker “turns your text-formatted data (logs, sensor readings, etc) into structured Avro data, without any need to write parsers or extractors”. It aims to reduce that most familiar of all obstacles to data analysis by automatically generating structure for text-embedded data.

dat is a new project—existing just as a mission statement, so far!—that aims to be “a set of tools to store, synchronize, manipulate and collaborate in a decentralized fashion on sets of data, hopefully enabling platforms analogous to GitHub to be built on top of it”. Derek Willis comments on its significance.

Communist is a JavaScript library that makes it easier to make use of the JavaScript threading tools called “workers” (surely such a library should be called Manager or Cadre? anyway…). Communist’s demos include data-pertinent items like parsing a dictionary and creating a census visualization.

If you’re wondering what can be learned about you from your metadata, check out Immersion, a meditative MIT Media Labs project which takes your Gmail metadata and returns “a tool for self-reflection at a time where the zeitgeist is one of self-promotion”.


How is the Brazilian uprising using Twitter? Check out this report for some revealing numbers and insights in the form of charts and network visualizations.

Some initial results from the Phototrails project have been posted. Phototrails mines visual data from InstaGram to explore patterns in the photographic life of cities.

What went wrong at the G8 summit with the possibility of “a new global initiative to open up data that is needed to tackle tax havens”? OKFN policy director Jonathan Gray takes a look at what needs to happen in the way of G8 companies connecting “the dots between their commitments to opening up their data and their commitments to tackling tax havens”.

This has been a good month for OpenCorporates. Most recently, OpenCorporates has quietly started releasing visualizations of the network structures of corporate ownership. This visualization of the network of companies connected to Facebook Ireland gives a taste of what is to come.

La Gazette des Communes recently published an app breaking down “les préréquations horizontales” region by region as a first step to evaluating the redistribution project’s success. La Gazette has now published the code and data for the app. is a visualization of Danish companies’ payment of corporate income tax for the year of 2011. Drawing on data from and built with MapBox, the map highlights a disturbing (albeit, as the authors hasten to point out, potentially legally explicable) amount of tax avoidance.

DATA SOURCES has launched, providing “open data about crime and policing in England, Wales and Northern Ireland” through both CSV downloads and an API under an Open Government License.

In what the BBC is hailing as “a historic moment”, the British National Health Service has released the first of a series of performance datasets on individual British surgeons, this set covering vascular surgeons. The data is available from the NHS Choices website.

The Global Observatory, as reported by, is a database which aims to document the “large-scale land acquisitions or ‘land grabs'” that have resulted in 32.8 million hectares of land falling into the hands of foreign investors since 2000. It has recently updated its online tool for “the crowdsourcing and visualisation of data as well as the verification of sources of such data”.

Foursquare has “created an authoritative source of polygons around a curated list of places”, merged it with “data licensed from many governments around the world”, and released the result, Quattroshapes, 30 gigabytes of geospatial data, under a Creative Commons license.

Flattr this!