Several Takes on Defining Data Journalism
Every so often I get asked the question: “so what is data journalism?” I’m still not sure I have a very good definition of it, but here are three different ways I think we can view it:
- as a particular sort of output – one of the easiest ways of responding to the question is to point to a map or graphic that someone has used to illustrate a story, or a piece of “award winning” data journalism, and say “that is”. For anyone who works with data, however, they well know that producing a graphic is often the easy part of the process, and that most of the time is spent finding the data, fighting with it to get it into a state you can start working with it, and analysing the data, or asking it questions in order to find the story within it, or illustrate a story you have already discovered. This observation in turn leads to a second way of characterising data journalism:
- as a particular set of skills – that is, data journalism is not necessarily what data journalists produce, it’s best thought about in terms of the sorts of skills that data journalists need in order to produce the maps and charts that get pointed at as examples of data journalism.
One way of identifying what these skills might be is to look at job adverts for “data journalist” (I collected a few examples here: So what is a data journalist exactly? A view from the job ads…). Looking through them, many current ads seem to require skills associated with the development of interactive data driven applications, which puts the emphasis on a range of web design and development skills, again apparently associating the practice of data journalism closely with the production of things that are used to illustrate a story. That is, data journalism is to data what radio journalism is to audio and video journalism is to, erm, video?! (It’s probably also worth mentioning that data journalism is not necessarily genre based journalism, such science journalism or sports journalism – it’s not just “about” data.)
But that doesn’t feel right, either, which suggests a third way of considering data journalism: - as a process – and in particular, as a process that involves data somehow, though not necessarily exclusively. Whilst there may be “data outputs”, it might also be the case that the data journalistic process generates a lead that develops into a story that is not best illustrated using “data”. Data might lead us to a story, for example, that one particular garment retailer tolerates poor working conditions through the discovery that they use factories blacklisted by other retailers, but that story may be best expressed in other terms. The data, in other words, may simply play the role of a source, and in this sense “data journalism” is more process oriented, in much the same was that investigative journalism is, although potentially over much shorter timescales. (We might expect a data journalism piece to be produced in a matter of hours as part of the daily news cycle, for example.)
Under this process view of data journalism, the skills required of a journalist participating in the process may take the form simply of advanced information skills, such as the ability to run powerful advanced searches using web search engines, filter down a data set using text and/or numeric facets in a tool such as OpenRefine, or run structured queries over data in a database using a query language such as SQL.
The process might equally involve using data visualisation tools to make sense of a dataset, or generate further questions from it, questions that might be additionally asked of the dataset itself, possibly in conjunction with other datasets, or alternatively used to set up a question then asked of a person.
For certain data sets, statistical tests may be required to identify whether there is something or nothing in what the data appears to be saying, or questions asked of an expert in the field to identify whether a number is actually a big number or not (hat tip to FT Undercover Economist, and More Or Less presenter, Tim Harford, for that refrain!). And then it may be time to get the interactive developers on board. Or there may be no need.
So are we any nearer to having a definition of “data journalism” that take into account these different views?
Here’s one I quite like:
The art and practice of finding stories in data…
…and then retelling them.
This captures both the notion that data journalism is about finding stories from a particular sort of source (a data source) and then communicating them, whilst not requiring that the telling of the story is done in any particular way.
Here’s another:
Journalism in which “data” is one of the sources used to get or relate a story.
In this case, we see data as playing a role either in the sourcing of a story, or the communication of a story (or maybe even both), but again, we imagine data playing a role in “human” terms.
So what’s your favorite definition of data journalism?
See also: Data Journalism Handbook