Data roundup, February 20
We’re rounding up data news from the web each week. If you have a data news tip, send it to us at [email protected]
TOOLS, COURSES, AND EVENTS
A major update to the Overview Project happened on Friday morning. Overview is an open-source tool for journalistic data mining. The new release includes improved clustering and an integrated tweet display.
Learn how to do sentiment analysis in Python. Sentiment analysis is a natural language processing technique that has been growing in popularity with journalists. This tutorial will show you how it can be done, complete with code hosted on GitHub.
Babbage is a Clojure library and a model of data aggregation. Babbage streamlines the exploration of data in the Clojure REPL (read-eval-print loop) by breaking it down into three distinct and composable steps: creating inputs, partitioning records, and aggregating fields.
Daniel Navarro’s “Learning Statistics with R”, a textbook aimed at “psychology students and other beginners”, is now a free PDF download. It teaches both the rudiments of programming in general and the foundations of statistical computing in particular.
If you’ve just learned how to use D3.js or Raphaël and want to learn how real visualization developers do their job, here are seven dirty secrets of the trade to bear in mind as you approach real-world dataviz projects. Also check out the videos and slides from the last five Data Visualization NY meetups, the most recent of which cover data visualization using web standards.
A two-year research program exploring the impact of open data in developing countries has been launched by the World Wide Web Foundation. This program will conduct case studies in 14 countries from the global South in hopes of identifying the effects of open data policies in the developing world.
January was, in the words of Andy Kirk, a monster month for data visualization. His new collection of January’s best of the data visualization web testifies to the truth of that statement. It includes not only visualization but news and learning resources.
The CCAPS Conflict Dashboard is a new data tool for tracking emerging conflicts in Africa produced by the Climate Change and African Political Stability Program. It allows users to relate the CCAPS conflict datasets to a range of socioeconomic factors and to track the effects of conflict on communities.
Scarce German aid project data has been mapped by Christian Kreuz. Kreuz discusses the collection and presentation of the data in detail in his post.
Friday’s meteorite explosion in Russia has put meteorites on many people’s minds. Stefan Greens used freely available data to make a reconstruction of the meteor’s path. The Guardian, meanwhile, released data and a map on every meteorite fall on earth.
Also on many people’s minds is the European horsemeat scandal. Tony Hirst shows how to process and visualize horsemeat data to better understand the snafu.
The Why Axis breaks down the “chart warfare” between the New York Times and Tesla Motors over the NYT’s bad review. This war of infographics demonstrates the high value given to data visualizaton in today’s media.
A dataset of 10,000 porn stars was analyzed over a period of six months by Jon Millward. The result is a close look at “what the average performer looks like, what they do on film, and how their role has evolved over the last forty years”.
Why is the conviction rate for rape so low? A new infographic from Information Is Beautiful addresses the complex realities underlying this problem. The “thinking around the data” is addressed in a supplementary blog post.
NPR’s chart checkers, with help from Stephen Few and Nathan Yau, examine US President Obama’s graphics from his recent multimedia State of the Union address to work out whether they truly enhanced understanding. (Answer: well, sort of.)
The New York Times’s interactive dataviz of President Obama’s budget proposal is a good example of the power of D3.js. The visualization explores the data from four angles but maintains visual continuity thanks to D3.js’s animated transitions.
The Greek open data hub, developed and hosted by OKFN Greece on the CKAN platform, has been launched. More than just a data catalogue, the hub includes examples of apps built with the data, a live demo of SPARQL query functionality, and information on the Greek Linked Open Data cloud. Other new CKAN portals have been collected on the CKAN blog.
Swift on the heels of the German open data row, the German government’s government data portal has gone live—and been greeted with much criticism. Meanwhile, Open Data Hamburg and Offene Daten Moers have also been launched, providing data on several areas of city administration.
The Buildings Performance Institute Europe has made its data “open data ready” by enabling CSV and PDF downloads. This data portal brings together “a wide variety of technical data never before collected EU-wide”.
The US Energy Department has opened an energy resource hub including features on new APIs, datasets, and advanced search capabilities.
Significant updates to DataBC, the British Columbia data portal, were unveiled on Monday. These include a new user interface and search capabilities.