Data roundup, February 13

February 13, 2013 in Data Roundup

We’re rounding up data news from the web each week. If you have a data news tip, send it to us at [email protected].

Photo credit: Ashley Mayes

TOOLS, COURSES, AND EVENTS

A data visualization contest for $2,000 is being held by the Guardian in cooperation with Google and the Open Knowledge Foundation. To quote Simon Rogers: “All we want you to do is to take an open dataset from any government open data website […] and visualise it.” The contest is open to citizens of the US, UK, France, Germany, Spain, Netherlands, and Sweden.

If you need help picking optimally distinct color palettes for your contest entry, try iWantHue, a new app for data scientists. Choose your color space and your clustering method and watch an optimal palette gel together.

New to data visualization? Free online data training with an emphasis on visualization will soon be made available by the Knight Digital Media Center. These four workshops will be led live by Len De Groot. Registration opens February 18.

R users will want to learn about the 10 R packages that Yhat wish they knew about earlier. These packages will make it easier for you to work with strings, do time series analysis, and make random forest classifiers.

Statistical coders who love Python, meanwhile, will enjoy “Will it Python?”, a series of tutorials giving the author’s “attempts to port data analyses originally done in R”. It includes great examples for scikit-learn, Pandas, and more.

In other Python data science news, Hyperopt, a Python library for hyperparameter optimization—”for optimizing over awkward search spaces with real-valued, discrete, and conditional dimensions”—is now available via PyPi. Consult the wiki for details and documentation.

If you missed the Journalism Interactive 2013 talk “Data: Practical Tips from the Field” by Ken Schwencke, Sisi Wei, and Derek Willis, don’t worry: they’ve posted their slides online, including the presentation handout and links to examples.

In fact, if you missed Journalism Interactive 2013 altogether, maybe you should check out Dan Dreimold’s “somewhat live” blog post covering the conference, consisting of 100 things Dan learned (and you can learn too).

DATA STORIES

Violence against First Nations women in Canada is the subject of a new map visualization by activist group Anonymous. The map calls attention to “the horrors of missing and murdered Indigenous women” which Anonymous previously condemned Canadian police for ignoring in a YouTube release.

Does Congress really suck? This is the question addressed by Nilkanth Patel’s new data visualization project. Patel’s work evaluates the US Congress’s efficacy, relationship to donors, partisanship, and public good name.

Which changes proposed by lobbyists went straight into amendments by EU Committee members on the General Data Protection Regulation? LobbyPlag is a data exploration webapp that addresses this question. LobbyPlag is the work of OpenDataCity and uses data which is freely available from the website.

Follow the development of a visualization of corruption coverage in Spain by Numeroteca and learn how to use PageOneX to trace the evolution of stories on newspaper front pages.

Version 2.0 of Checkbook NYC has been released, providing new access to the day-to-day spending of the City of New York in the form of a series of interactive graphs and tables.

Clever°Franke’s social media sentiment map is a beautiful infographic tracing the relationship between Netherlands mereological data and social media sentiments about the weather. Its overall message: “The Dutch generally don’t deceive themselves about meteorological circumstances.”

Read an interesting conversation with dataviz experts Scott Murray, Rachel Binx, Mike Bostock, and Tom Carden from the Source. For another interesting interview with a data visualizer, check out this interview with Benjamin Wiederkehr of datavisualization.ch.

DATA SOURCES

The University of Nairobi has adopted an open access policy to its faculty’s academic output and has launched a new data portal. This portal will host journal articles, research data, theses, and other publications and productions of the University.

The US government has released consumer.data.gov, “a centralized portal to federal government data and resources that can empower consumers to make better informed choices”.

The ClueWeb12 dataset, an archive of 870,043,929 English web pages collected in 2012, has been released. It is available for research purposes from Carnegie Mellon University for a fee which includes the hard disks on which the dataset will be sent to you.

The European Commission has launched a new data portal, the open data hub of the European Union. 5,831 sets of European Commission data are already available for download.

Alex Singleton’s 2011 Census Open Atlas Project is, as the Guardian reports, a major event in the curation and presentation of open data. Singleton’s work comprises 127,466 maps of the 354 local authorities of England and Wales, accompanied by R source code and data.

The government of Aragon has opened a new data portal. This finally puts into motion Aragon’s longstanding plan to stimulate economic activity through open data.

← Shape Matters or The Courteous Clauses Expedite an Expedition of Data

Geocoding Part I: Introduction to Geocoding →

Data roundup, February 13

Search the blog

On the blog