Data roundup, February 27
We’re rounding up data news from the web each week. If you have a data news tip, send it to us at [email protected].
TOOLS, COURSES, AND EVENTS
This Saturday was International Open Data Day. Over a hundred cities worldwide hosted events in which applications were built, data was liberated, and the open data cause was promoted. The interactive IODD map gives some sense of the scale of the event.
Visualizing.org launched the Global Development Sprint in collaboration with the World Bank on Open Data Day. This is “a collaborative data visualization project where anyone can fork code to visualize the same open data set”, aiming to demonstrate the benefit of open data policies.
Probabilistic programming, as Rob Zinkov explains, is a powerful and intuitive way of specifying complex models. You can learn this paradigm with “Probabilistic Programming and Bayesian Methods for Hackers”, a Github-hosted ebook that provides a hands-on introduction to Bayesian inference in Python and PyMC.
Syracuse University’s Jeffrey Stanton has made his “Introduction to Data Science”, an interactive textbook for a non-technical audience, available for free. Syracuse will also offer a free and open online course, “A Brief Introduction to Data Science with R”, later this month.
DataFreeze allows you to script static JSON and CSV exports of relational databases for use in data-driven apps. As its creator explains, this is a handy solution to the problem posed by the cumbersome datastores used by high-volume websites.
Videos of over 300 natural language processing lectures have been posted to Vimeo by Chris Callison-Burch. Everyone who works with text—that is, just about everyone!—should check out this collection of cutting-edge research presentations.
For a fun, difficult introduction to d3.js, check out the slides from Tom MacWright’s presentation to the DC jQuery Users Group. MacWright’s presentation will make the most sense to those who bring strong JavaScript abilities.
DATA STORIES
Open Data Day 2013 has already borne a great deal of data-journalistic fruit. This roundup by the Atlantic Cities captures some of the highlights. These include a dashboard of Vancouver rental housing issues, a map of head start programs in Oakland, and a map of tree species in Washington, DC.
Despite power outages, OpenStreetMap Nepal made great progress mapping Kathmandu on Open Data Day. Nirab Pudasaini blogs about it.
All of the news graphics by the Guardian and the New York Times are now available in a single place. Prepare to lose hours of your life to this compendium of data storytelling.
Last week, Princeton researchers produced an infographic on the virtual water trade. This week, Density Design has produced an interactive visualization of the scientific debate over virtual water trade, showing how this research is situated in its material context.
How did the £13 million bill for the Olympic torch relay get paid? These three investigators submitted over 100 FOI requests and concluded that it was drawn from “funds intended for maritime festivals, economic development, council reserves and food markets”.
A new map of the Twitter languages of New York City visually captures some of the massive linguistic diversity of the city. The map represents 8.5 million geocoded tweets collected over a three-year period. It is a sequel to Twitter Tongues, another map which did the same for London.
What impact will the NYC MTA fare hike have on different communities? Ingrid Burrington created an interactive map to explore this question. The map breaks down each MTA station’s ridership by various pay levels and identifies the median income of each station’s neighborhood.
DATA SOURCES
A veritable deluge of Italian open data has been released to commemorate Open Data Day. This includes data portals for the Italian senate, the city of Venice, the city of Trento, and the region of Puglia. Geodata for the province of Bolzano was also handed over to OpenStreetMap.
The Berlin Open Data Community portal was launched on Open Data Day by OKFN Germany. This portal is intended to be complementary to, rather than in competition with, official government data portals.
In other German open data news, the Hanseatic city of Rostock has launched an alpha of its data portal. Already 91 datasets are available on the preliminary site.
The city of Chicago has taken selected datasets from its data portal and put them on Github. The repos contain example code and instructions to help you make use of the data.