Data Expeditions at MozFest
Expeditions into the Data Landscape: the School of Data goes to #MozFest
Find out what happened at MozFest – and see the tools and data sets to recreate it yourself!
Saturday morning at MozFest. A sold out building, full of a thousand hackers, builders, makers, geeks, journalists, thinkers and more. And right at the top on the 9th floor? Three ‘data sherpers’ in sparkly cloaks…
Data Expeditions
The concept behind the ‘Data Expeditions’ run by the School of Data at this year’s MozFest was simple. Based on the ‘Dungeons and Dragons’ role-playing game, data explorers would tackle real world problems together, developing their data wrangling skills in the process.
As a first step, explorers were asked to rate their abilities. Can you tell a story? analyse data? code? tweet? draw? The emphasis was on ‘doing’, but not in any narrow sense – often, it’s the data newbie asking a ‘stupid question’ that sets the team on a fresh track, and becomes the biggest contribution of the day.
Next came the quests. Three Data Sherpas (still sparkling) set out three missions: delving into the data surrounding extractive industries and oil mines; exploring possible causes for a dramatic plummet in life expectancy in central Africa; and burrowing into the grimy world of tax havens.
The explorers divided, the sherpers guided – and the quests began!
Quest 1: Mining the Mines
The discovery of oil or natural resources in a country and subsequent mining and extraction activities have enormous economic and political significance. While some countries benefit off their natural wealth, others fall prey to corruption and exploitation. Approaching this topic we did not have a clear story we intended to investigate – instead the discussion in the first part of our session focussed on how to approach such a complex domain. After some discussion (luckily, the large team included two experts from the area and an investigative reporter), three themes areas emerged that we then decided to further dig into in smaller team:
- One team worked on possible ways to combine company ownership information, conference documentation and social network data to generate a picture of the network of actors, companies and interests behind the extractive industry.
-
A second team decided to use a commercial database to explore the ownership of a single mine in the DRC. Where did money come from and who are the owners? A quick set of post-its on our data expeditions map served as a visualization of the setup.
-
Mapping was also the topic of the third group, which aimed to contrast overall revenue from extractives to economic, political and social indicators, such as the Corruption Perceptions Index. Using CartoDB, the group was able to easily generate a map that displayed country-by-country comparisons of the resulting ratios.
Quest 2: A Call to Investigate an African Crisis
In true Dungeons & Dragons style, Data Sherper Michael got a call from some dwarves in Middle Earth, who had heard about a sudden drop in life expectancy in central Africa. They didn’t know the details, but believed that the World Bank gnomes might have some facts which could shed some light on the mystery.
Cue the explorers in quest group two, who worked together throughout (kudos to such a large number!) to solve the mystery. After initial musings about a civil war, the team discovered a striking correlation between the increasing prevalence of HIV and plummeting life expectancies. By cross-referencing with other data sets, the team also noticed some interesting connections around health expenditure, public statements issued by politicians, and quirkier topics such as the target audience of condom marketing. More work would need to be done to really make a claim about causality, but there was certainly plenty to mull over.
Quest 3: Tax Islands
This was an experiment in providing a group with a chain of possible investigations (a map for the landscape) and then allowing a storyteller to choose their own expedition path throuh the data. The group divided into two teams to explore the possible stories (routes) you might want to take through tax avoidance and evasion.
The first group chose to show how an online book retailer might avoid tax, starting at the point of sale and tracing the money all the way through to the final countries in which tax was paid (and at what rate!). The second group wanted to show the effects of changes in tax laws, and looked at where large companies paid their tax and how they ‘moved’ as tax breaks changed.
The session was a big success. People really engaged with the issue, and the tax team benefitted from some particularly valuable insights from a few accountants who had direct experience of working on corporation tax for large companies. The format really worked (unless it was the spangly cloaks!) and our data expedition troops stayed at their desks until the very end.
Next steps: Online Mountaineering
The Data Expeditions format was somewhat experimental. We had no idea if the concept would work, but our inkling was that the only way to really teach data skills was to confront people with a mountain. By forging your own path (with the occassional leg-up or guidance from a sherper!), data explorers can pinpoint the extra skills they need to develop in order to scale new obstacles, map their own journey and ultimately to tell their own story. The answer may be at the top, but there are multiple routes to the summit – and each will offer a fresh view over the landscape.
Because the session was so successful, we are keen to repeat the Data Expeditions formula. Our next challenges will be:
- To work out how to recreate this social dynamic online
- To continue to follow up on these threads, questions and leads
To do this, we need your help!
- Were you at the Data Expeditions session at MozFest? Write a short summary of what your team did and what you learned and send it to schoolofdata[@]okfn.org – we’d love to feature it on our blog!
- Keen to run your own Data Expeditions session? Please do! You can find some of the resources we used below. Additionally, see the ‘Data Expeditions Toolkit’ below – sign-up to the mailing list and drop us a line at schoolofdata [@] okfn.org to find out more.
- Know of more resources? Drop a line to via the mailing list or schoolofdata [@] okfn.org to let us know!
Recreate it yourself!
Use the Expeditions Toolkit
- Print out a copy of the character sheet (front, back) for all of the people participating
- Think of your topic areas and devise a suitably ridiculous name for your expedition. (Bonus points for ridiculous puns revolving around online gaming).
- Make some role descriptions cards. For each of the possible roles outlined in the character sheets outline tasks which people with that skillset could perform. We recommend at least 3 possible levels.
- Buy yourself a cape (optional)
- Get rolling – hand out your role desciption sheets, get people to fill in the radar plot and assign roles. Allow people to also specify a role that they are not so strong in, but which they would like to know more about, you can buddy them up with someone who is more advanced in those skills and encourage them to watch closely and ask lots of questions.
- Talk everyone through the notion of the expedition and explain their roles to them. Make it clear the aim is to produce something at the end of the session, that could be a blog post, a visualisation or a load of post-it leads – don’t specify, let them be as creative as possible!
- Start the storytellers off thinking of a question and get them talking to the scouts and analysts about where they might find that data. You’ll need lots of post it notes.
- Get the designers and engineers listening in to the conversations happening and working out how it might be possible to present the information, and feed back into the discussion
- One you’ve got a question, set the scouts and the analysts loose on finding and analysing the data.
- Get everyone to document their expedition, the avenues they tried which failed for some reason (the path was blocked), what worked, what data-sources the found and what tools they used. These are all useful for generating leads which people could follow up on afterwards and teaching people how a real data-campaign may be run.
We did ours in 3 hours – you may like to try doing it for longer, however make sure your session is short enough to have people’s full attention for the duration of the session and keep energy high.
That’s it. Good luck noble sherpas.
Resources that we used:
Data Sources
- Global Data
- International Monetary Fund
- US Securities and Exchange Commission Filings
- Statistics from HMRC (UK)
- Open Corporates
- DueDil
- BP Statistical Review of World Energy
- Revenue Watch
- Transparency International’s Corruption Perception Index
- World Bank
- Annual Reports from individual companies
Tools & Resources
- Excel
- Processing JS
- Visual.ly
- Cartodb
- Raphael JS
- Recline Timeliner
- Evan Raskob’s Intro to Programming
- Data Expeditions Character Sheet