From idea to story: planning the data journalism story
This is a report from the first workshop during the School of Data Journalism organised by Open Knowledge, European Journalism Centre and International Journalism Festival. The session was led by Steve Doig, Knight Chair in Journalism, specializing in computer-assisted reporting — the use of computers and social science techniques to help journalists do their jobs better.
You can download Steve’s presentation here.
Why do data journalism at all?
Steve’s take on this is that data journalism allows journalists to go beyond anecdotes and base their stories on facts and evidence. You can keep using anecdotes but based on data, you can find the best ones, which are most illustrative for that particular story.
So, how can journalists find data story ideas? Before anything, try to look at topics you already report on like sports, elections, disasters, crime investigations, money flows, etc. Almost anything journalists typically cover produce data which can be analysed. Other places to get ideas for data journalism stories
- See what other journalists are doing – If something is going on in one city, chances are it’s happening in your city too
- See featured projects in datadrivenjournalism.net
- IRE’s Extra Extra feed
- Have a look at the Guardian data blog – not always investigative stories, not big heavy crime or social justice
- Reading documents produced by government agencies and academics who collect large amounts of data. Pay attention to footnotes and bibliography which can lead to interesting data sources!
How do you get from an idea to a story?
Work backwards from your idea:
1. Think of the statements you want to make
Start with a hypothesis like crime is getting worse in my area. For this hypothesis, you might want to make statements like: crime is increased by x amount, the amount of crime per 1000 people in such and such city is the greatest in our area, etc.
2. Think of what variables you need to make the statements Now what variables do I need, think in terms of the table of information (columns are variables and row are the individual data points).
There are 2 diff kind of variables:
- Categorical (gender, type of rime, zip code) variables with lables
- Numerical variables – the counts, number of crimes, number of accidents, numver of arrests
Examples of variable: type of crime, population of the places where crime is happening, date of crime, time, location, number of victims, was an arrest made (y/n?)
3. Think who collects the data Once we know our variables, check who collects this. All agencies we cover type of government, corporations, etc. are collecting lots of information so we don’t have to collect data ourselves most of the time
4. Get the data from there
Then you face the problem of getting the data. In US there are pretty strong public record law. In Europe as well most countries have Freedom of nformation laws or an official way to request data from public agencies.
Data formats Don’t be intimidated by different formats. Know how you want to work with data, for example Excel. You don’t need to get the data in .xls but you can use programmes to translate data from one format to another. Find a data nerd who can help you! One place to find good nerds is on forms or email lists:
- Nicar-l, lists in the states where data journalists talk to each other
- School of data
One format you should try to avoid getting is PDF – it doesn’t import well in other formats. If sometimes they only give you a PDF there are tools to export it in other formats like Tabula.
5. Clean the data Data is sometimes messy. An classic example is campaign finance information which has all been typed in by volunteers – name of cities are always misspelled! In this case you need to find all the cities which were misspelled and correct them so you can say for example how much was collected from a single city. People who collect data are doing it for bureaucratic matters and it doesn’t really matter how clean it it. For people who use data for analysis need more precision and thus need to clean the data
6. Once you have clean data – what do you do with it? Look for patterns! Highs, lows, maximums, minimums, averages, etc. Get in your mind the shape of the data, look for outliers, anything in your data which is weird and stands out. Remember that many stories have been discovered by easy things like sorting, etc. Tools:
- Use simple spreadsheets functions like sort, filter, functions and pivot tables
- Another tool is your brain: math and statistics but its pretty much like 1+1 =2!
- Resource for math http://t.co/CaZg5qS0jM
Last, it’s important to remember that data journalism stories are best done in teams. There are may roles to cover in such a team including: other reporters, editors, graphic artists, photographers, videographers, page designers, web designers, app developers, etc.