Data roundup, June 12

June 12, 2013 in Data Roundup

We’re rounding up data news from the web each week. If you have a data news tip, send it to us at [email protected].

Photo credit: Kris Krüg

Photo credit: Kris Krüg

TOOLS, COURSES, AND EVENTS

The World Wide Web Foundation and its partners (including the OKFN) have launched the Global Open Data Initiative, “a champion for Open Data globally”, aiming to create and promote a unified set of guidelines assisting governments in the use of open data.

Today wraps up the second Open Economics Workshop, an Open Knowledge Foundation event hosted at MIT. As reported on the Open Econ blog, the event brought together some 40 economists and social scientists to discuss research data sharing and transparency in economics.

Data-Crunched Democracy was a conference bringing together journalists and analysts “to cut through the hype and understand the use of voter data in campaigns”. Derek Willis reflects on “the lessons for journalists covering campaigns that engage in the use of data” in an in-depth blog post.

I’ve known more than one graduate student in the social sciences who has described Excel’s pivot tables as “the best thing ever”. Pivot tables are a powerful tool for data exploration. A new blog post by Abbott Katz explains you can begin using pivot tables in your own work.

Real-time and historical data on United States drone strikes is now available as an API. Dronestre.am is a public API making it easy to “build data visualizations about covert war […] in Pakistan, Yemen, and Somalia”.

Learn about pandas, “one of the best, and most important, libraries for data analysis in Python”, and how it can be used to do serious data analysis using SQL queries in a new blog post by John Beieler.

Bayesian Methods for Hackers, an introduction to Bayesian probability theory in practical and Pythonic terms, has appeared on the data roundup before. Now a draft of the PDF version of the book has been released. Check out this “understanding-first” introduction to “the natural approach to inference”.

Check out Source’s journalism code event roundup, June 10, for a worldwide selection of hackathons and conferences in data-driven and computer-assisted journalism.

DATA STORIES

So the NSA has all your metadata. What can they do with it? German Green Party politician Malte Spitz sued to repatriate six months of his own phone data and made it available to Zeit online, who combined it with publicly available data to reconstruct six months of Spitz’s life. You can read more about the project and download its data. You can also check out a timeline of the NSA’s domestic spying from the Electronic Frontier Foundation.

ProjectPolicy aims to “unify, organize and visualize the world’s government information onto one intuitive web platform”. Its take on San Francisco is available as a demo of what it aims to do.

America’s Worst Charities presents a year’s investigation by the Tampa Bay Times and the Center for Investigative Reporting into the misuse of charity funds by American charities. It prominently features an interactive presentation of the data, some of which is also available for download in CSV form.

The central limit theorem is a statistical theorem of scientific importance that cannot be overstated. A new visualization of the theorem constructed with D3.js, explained in terms of coin flips, makes it easier to develop intuitions about its meaning.

Stamen has put together 3D contour maps of the surface of Mars from data collected by the Mars Orbiter Laser Altimeter. As their blog reports, these maps are “a small gesture of thanks to the scientists who are working hard to do science and communicate with the public despite the stupid sequester”.

The latest work from Accurat presents the lives of ten famous painters in the form of beautiful timelines. Each timeline presents the artist’s personal history in a manner sensitive to the artist’s style.

Check out datenjournalist.de’s roundup of Datenjournalismus im Mai 2013 (German) for a collection of some of last month’s best examples of data-driven journalism.

DATA SOURCES

In a move that is unlikely to distract attention away from the PRISM scandal, the Obama administration has released a portal calling out climate science deniers.

Open Nepal has launched Open Data Nepal, a project “not about creating yet another data repository in the web but an effort to curate and disseminate data that is already available in public domain”.

Canada’s Global News has obtained, at great difficulty, a database of over 61,000 Albertan oil spill incidents spanning the period from 1975 to 2013, and they are “now offering this information to the public for download”. This is certainly one of the most important datasets to see the light of data in Alberta—especially that Alberta’s open data catalogue has been described as perhaps “the most useless […] in the history of open data catalogues”.

The Los Angeles Times has acquired and released a database of the salaries of Department of Water and Power employees in 2012, finding that their “average total pay … is more than 50% high­er than oth­er city em­ploy­ees”. You can download the dataset and see for yourself.

Freddie Mac, a major US mortgage backer, is “standardizing its processes and making raw data more easily accessible to the public”. This move towards “transparency” appears to be part of a process of privatization of government-sponsored mortgages, “using our data to attract private capital”.

Flattr this!