OKFN at GEN Editors Lab Hackday in Barcelona

Concha Catalan - April 17, 2014 in Data Journalism

Global Editors Network (GEN) is an association of newsroom editors from different countries. It is based in Paris and promotes innovation in journalism. This year they have organised a score of Editors Lab Hackdays around the world to develop journalism projects in 48 hours. I had the privilege to participate in the hackday in Barcelona on 13th and 14th March representing the Open Knowledge Foundation (OKFN) together with David da Silva (@dasilvacontin) and Joan Jové (@joanjsTW).

The GEN hackday slogan, Hack the newsroom, promotes a change in newsroom routine by way of journalists, programmers, and designers working together. That is why most teams competing on hackdays are made of people employed by media. That was not our case. We brought in the spirit of real hackathons: we worked as a team once we got there.

From August 2013, hosted by media worldwide, GEN representatives have run hackdays in Buenos Aires (Clarín), Cape Town (Media 24), London (The Guardian), Rome (L’Espresso), Paris (Le Parisien), Sunnyvale, EEUU (Yahoo!), Moscow (Ria Novosti), Brussels (VRT), and Madrid (El País). From Barcelona, the GEN was on its way to Cairo (Al-Masry Al-Youm) and then New York (New York Times/Reuters). A different topic is set up for each hackday. In Madrid, it was liveblogging; in Cairo, it was apps with maps to make city traffic better by crowdsourcing.

Caixaforum Barcelona; photo by Ferran Moreno Lanza

The Col·legi de Periodistes de Catalunya hosted the hackday in Barcelona. It took place in Caixaforum, on the topic “Election Coverage: How to create new interactive tools to encourage civic engagement and productive political discussion? How to better use data to make your coverage more relevant to voters?” We also had two masterclasses. Luis Collado and Elías Moreno from Google showed a few data analysis tools. Mirko Lorenz from Journalism++ presented inspiring projects.

Our team, looking ahead at the coming European Parliament elections on 25th May 2014, presented the project Parlamentemos (“Let’s parliament”): a webpage and a mobile app with information about the current members of the European Parliament from Catalan political parties and also about the candidates whose names we know so far (let’s not forget that the Popular Party in government has not named any of its candidates yet).

When our project becomes real, citizens should be able to ask MEPs questions using these tools and get their answers. We would also like to link it to our project of open government in Catalonia, opengov.cat, and extend it to other parliaments. (If you are a programmer and you can help, please email us at opengovcat@gmail.com)

OKFN team at the Barcelona GEN Editors Lab hackday

The hackday winner was La Vanguardia, and this was their project. The newspaper Ara.cat got a mention for “El inaugurómetro” (the inaugurationmeter, relating building projects and results in local elections). The Corporació Catalana de Mitjans Audiovisuals (that groups together TV3, Catalunya Ràdio and other public media) put up a project about audience using their mobile to have a say in broadcast election debates real time, and got another mention. Other participants were eldiario.es, El Periódico, Datan’Press, and Wikidiari.

La Vanguardia will compete in the final of the “newsroom world cup” at the GEN Editors Lab summit in Barcelona on 13th and 14th June 2014 against The Times, La Repubblica, La Nación, Radio France, and others. We hope to be able to inform you about it, too. Here is the GEN summit programme.

flattr this!

Data Roundup, 16 April

Marco Menchinella - April 16, 2014 in Data Roundup

Ana_Cotta – saudades da Amazônia

Tools, Events, Courses

On Wednesday the 30th, the eighth edition of the International Journalism Festival will take place in Perugia. The event has become one of the most important of its kind in Europe, and it will host hundreds of journalists from all over the world.

The IFJ will also be the location of the third edition of the 2014 School of Data Journalism jointly organized by the European Journalism Centre and the Open Knowledge Foundation. The School will start on the May the 1st and will see the participation of 25 instructors from world-leading newspapers, universities, and think tanks.

ProPublica just announced the release of two JavaScript libraries. The first one is Landline and will help developers turn GeoJSON data into SVG browser-side maps. The second is built on the previous one and is called Stateline and will facilitate the process of creating US choropleth maps.

Data Stories

Chris Michael from the Guardian Data Blog recently published a short article listing the world’s most resilient cities. Michael extracted data from a study of Grosvenor, a London-based company which measured resilience by assigning a value to cities’ vulnerability to environmental changes and their capacity to face political or economical threats.

British citizens might be interested in the quality of air they breathe everyday. Those who are worried about air pollution should take a look at George Arnett’s interactive choropleth map showing the percentage of deaths caused by particulate air pollution in England.

What’s the role of the world tech giants in politics? Tom Hamburger and Matea Gold tried to explain it in this article on the Washington Post by observing the evolution of Google in its lobbying activities at the White House. Google’s political influence increased enormously since 2002 thus making the company the second largest spender in the US on lobbying practices.

Are conservatives all conservatives in the same way, or is there a certain degree of moderation among them and toward different issues? On his newly-born FiveThirtyEight, Nate Silver faces the argument by displaying data on the “partisan split” between the two US parties on several main topics.

If you are Catholic, or maybe just curious, you should be very interested in seeing The Visual Agency’s last infographic, which represents through a series of vertical patterns the number, geographical area, and social level of professions of all Catholic saints.

Gustavo Faileros, ICFJ Knight International Journalism Fellowship, is about to present to the public InfoAmazonia, a new data journalism site which will be monitoring environmental changes in the southern part of South America using both satellite and on-the-ground data.

In addition, as environmental changes increase, so do the number of deaths of environmental and land defenders. The Global Witness team has just published its latest project, Deadly Environment, a 28-page report containing data and important insights on the rise of this phenomenon which is incredibly expanding year by year, especially in South America.

Data Sources

Michael Corey is a news app developer who was involved in the realization process of the National Public Radio mini-site named Borderland. In this post, he analyses the main features of the geographical digital tools that he used to collect and display data on the US-Mexico border which helped him correctly localizing the fences build by the US government all along the line which separates the two Countries.

The data-driven journalism community is expanding rapidly, especially on Twitter. If you need a useful recap of what has been tweeted and retweeted by data lovers, then the Global Investigative Journalism Network #ddj top ten is what you need.

flattr this!

Case Study: Steve Kenei, Technical Analyst at Development Initiatives in Nairobi, Kenya

Zara Rahman - April 10, 2014 in Open Development Toolkit

As part of the Open Development Toolkit, we’ll be talking to people based in aid-recipient countries who work with international aid data to try and work out what the needs of data users are, what else might be useful, and what people need to use data more effectively in their work, as the ‘research’ stage in the project. Here’s the first in the series of case studies, with Steve Kenei, who works with Development Initiatives in Nairobi, Kenya. Steve is a Technical Analyst, working to support a team of data analysts and researchers, and we spoke via Skype on April 10th, 2014.

Colleagues at Development Initiatives come to me to find out how/where to get the data they need, to convert it into a format they can work with, and to manipulate the data.

A common issue that I come across is people coming to me with PDFs, and needing the data in a reusable format; it’s an old problem, and it’s getting boring!

I normally work directly with the IATI Datastore, but for people with less technical knowledge, there’s a number of problems with it. Even before you get to the technical skills needed to manipulate the data, even the language used isn’t useful for the average layperson: the words and terms used assume that people will understand and know what an ‘activity’, or a ‘transaction’, or a ‘budget’ is, as the very first step, for example. Then it’s difficult to know what you need in order to answer your question or even if it’s possible to answer what you’re looking for with data from the IATI datastore.

I usually point people towards donor portals, if they know who they want the data from—DFID’s DevTracker for example, or the UNDP’s open data portal. My colleagues also had a two-day training on using the OECD data, so now they can access this data themselves without any problems; this was really useful for them.

From my experience, people don’t often need disaggregated figures or even detailed geocoded data. They’re more interested in big aggregated figures that they can use in a ‘headline’ style, rather than the small, detailed data. For example, the level of knowing how much is being spent in a certain country within a certain sector is adequate, or how much a certain donor is spending in a certain country.

There was one time that a colleague of mine wanted to know how much money was being spent on HIV/AIDS prevention + support in Western Kenya, but we couldn’t find anything. We looked at the IATI datastore, directly on the Global Fund site, but we didn’t get anything.

If you are working with aid data in your work, we’d love to hear from you! Get in touch with zara.rahman[at]okfn.org, or drop @OpenDevToolkit a line on Twitter; your input would be so valuable to help us understand what we can best do to support you in your work!

flattr this!

The School of Data Journalism 2014!

Milena Marin - April 3, 2014 in Data Journalism

DJH_5 copy

We’re really excited to announce this year’s edition of the School of Data Journalism, at the International Journalism Festival in Perugia, 30th April – 4th May.

It’s the third time we’ve run it (how time flies!), together with the European Journalism Centre, and it’s amazing seeing the progress that has been made since we started out. Data has become an increasingly crucial part of any journalists’ toolbox, and its rise is only set to continue. The Data Journalism Handbook, which was born at the first School of Data Journalism is Perugia, has become a go-to reference for all those looking to work with data in the news, a fantastic testament to the strength of the data journalism community.

As Antoine Laurent, Innovation Senior Project Manager at the EJC, said:

“This is really a must-attend event for anyone with an interest in data journalism. The previous years’ events have each proven to be watershed moments in the development of data journalism. The data revolution is making itself felt across the profession, offering new ways to tell stories and speak truth to power. Be part of the change.”

Here’s the press release about this year’s event – share it with anyone you think might be interested – and book your place now!


April 3rd, 2014

Europe’s Biggest Data Journalism Event Announced: the School of Data Journalism

The European Journalism Centre, Open Knowledge and the International Journalism Festival are pleased to announce the 3rd edition of Europe’s biggest data journalism event, the School of Data Journalism. The 2014 edition takes place in Perugia, Italy between 30th of April – 4th of May as part of the International Journalism Festival.

#ddjschool #ijf13

A team of about 25 expert panelists and instructors from New York Times, The Daily Mirror, Twitter, Ask Media, Knight-Mozilla and others will lead participants in a mix of discussions and hands-on sessions focusing on everything from cross-border data-driven investigative journalism, to emergency reporting and using spreadsheets, social media data, data visualisation and mapping techniques for journalism.

Entry to the School of Data Journalism panels and workshops is free. Last year’s editions featured a stellar team of panelists and instructors, attracted hundreds of journalists and was fully booked within a few days. The year before saw the launch of the seminal Data Journalism Handbook, which remains the go-to reference for practitioners in the field.

Antoine Laurent, Innovation Senior Project Manager at the EJC said:

“This is really a must-attend event for anyone with an interest in data journalism. The previous years’ events have each proven to be watershed moments in the development of data journalism. The data revolution is making itself felt across the profession, offering new ways to tell stories and speak truth to power. Be part of the change.”

Guido Romeo, Data and Business Editor at Wired Italy, said:

“I teach in several journalism schools in Italy. You won’t get this sort of exposure to such teachers and tools in any journalism school in Italy. They bring in the most avant garde people, and have a keen eye on what’s innovative and new. It has definitely helped me understand what others around the world in big newsrooms are doing, and, more importantly, how they are doing it.”

The full description and the (free) registration to the sessions can be found on http://datajournalismschool.net You can also find all the details on the International Journalism Festival website: http://www.journalismfestival.com/programme/2014


Antoine Laurent, Innovation Senior Project Manager, European Journalism Centre: laurent@ejc.net
Milena Marin, School of Data Programme Manager, Open Knowledge Foundation, milena.marin@okfn.org

Notes for editors

Website: http://datajournalismschool.net

The School of Data Journalism is part of the European Journalism Centre’s Data Driven Journalism initiative, which aims to enable more journalists, editors, news developers and designers to make better use of data and incorporate it further into their work. Started in 2010, the initiative also runs the website DataDrivenJournalism.net as well as the Doing Journalism with Data MOOC, and produced the acclaimed Data Journalism Handbook.

About the International Journalism Festival (www.journalismfestival.com)
The International Journalism Festival is the largest media event in Europe. It is held every April in Perugia, Italy. The festival is free entry for all attendees for all sessions. It is an open invitation to listen to and network with the best of world journalism. The leitmotiv is one of informality and accessibility, designed to appeal to journalists, aspiring journalists and those interested in the role of the media in society. Simultaneous translation into English and Italian is provided.

About Open Knowledge (www.okfn.org)
Open Knowledge, founded in 2004, is a worldwide network of people who are passionate about openness, using advocacy, technology and training to unlock information and turn it into insight and change. Our aim is to give everyone the power to use information and insight for good. Visit okfn.org to learn more about the Foundation and its major projects including SchoolOfData.org and OpenSpending.org.

About the European Journalism Centre (www.ejc.net)
The European Journalism Centre is an independent, international, non-profit foundation dedicated to maintaining the highest standards in journalism in particular and the media in general. Founded in 1992 in Maastricht, the Netherlands, the EJC closely follows emerging trends in journalism and watchdogs the interplay between media economy and media culture. It also hosts each year more than 1.000 journalists in seminars and briefings on European and international affairs.

flattr this!

Data Roundup, 2 April

Marco Menchinella - April 2, 2014 in Data Roundup

Ars Electronica – Brain Art

Tools, Events, Courses

The Argentinean La Naciòn Data Blog continues to foster citizen participation in collecting and using open data. This time it did it by presenting VozData, a shared platform through which people can transform complex public documents into easily readable databases.

Do you want to put your visualization skills into practice? The Infographic Competition: Visualizing the Scale of the Brain gives you the opportunity to do it. Hurry up: the deadline is on April 30th.

Data Stories

The Guardian Data Blog recently published two interesting data journalism pieces we would like to recommend. The first is an article on Death Penalty Statistics written by Leila Haddou which summarizes the state of the art of executions in the world. The second is an interactive map showing country-by-country data on Europe’s young adults living with parents by Ami Sedghi and George Arnett.

If you are a fanatic about cars, you should take a quick look at Exploring Your Car’s European Roots, which displays the most important historical achievements in car and motor production.

The Data Desk of the Los Angeles Times recently released Crime L.A., a daily updated map which shows both violent and property crime trends in more than 200 neighborhoods of the city.

Google Flu Trend certainly was one of the biggest experiment in predictive analytics ever done in the recent history. Read Kaiser Fung’s point of view on why it represented a failure and Alexis Madrigal’s arguments in its defense.

Sam Wang from The New York Times published an article with data on autism showing the difference between the attention paid to the topic by the press and the scientific evidence.

Cartography is surely much better now that the second version of the Map of the Internet has just been released. The display of the oceans and the lands of the virtual world absolutely deserves applause and five minutes of your time.

Data Sources

The Washington Data Community is about to start a completely new version of their weekly newsletter. Subscribe to it and you will also get useful data job alerts.

A list of visualization tools is always worth reading. Code Geekz assembled one that may interest you containing 30 links.

flattr this!

Tackling PDFs with Tabula

Marco Túlio Pires - March 27, 2014 in Scraping

School of Data mentor Marco Túlio Pires has been writing for our friends at Tactical Tech about journalistic data investigation. This post introduces a tool for extracting data from PDFs, and it was originally published on Exposing the Invisible‘s resources page.

PDF files are pesky. If you copy and paste a table from a PDF into a new document, the result will be messy and ugly. You either have to type the data by hand into your spreadsheet processor or use an app to do that for you. Here we will walk through a tool that does it automatically: Tabula.

Tabula is awesome because it’s free and works on all major operating systems. All you have to do is download the zip file and extract a folder. You don’t even have to install anything, provided you’ve got Java on your machine, which is very likely. It works in your browser, and it’s all about visual controls. You use your mouse to select data you want to convert, and you download it converted to CSV. Simple as that.

Though Tabula is still experimental software, it is really easy to use. Once you fire it up, it will open your web browser with a welcome screen. Submit your PDF file and Tabula will process your file and show you a nice list of page thumbnails. Look for the table you want to extract.

Finding the table

Click and drag to select the area of the table. Once you release, Tabula will show you the extracted data in a friendly format. If the data is fuzzy, try selecting a narrower area, try removing the headers or the footnote, etc. Play around with it a bit. You can either download it as CSV and open it in a with spreadsheet program or copy the data to the clipboard.

Extracted data

Once the data is in a spreadsheet, you may need to do a bit of editing such as correcting the headers. Tabula won’t be perfect 100% of the time – it’s very good with numbers, but it can get confused by multi-line headers.

For further information and resources on Tabula, see:

flattr this!

Learning to Listen to your Data

Marco Túlio Pires - March 27, 2014 in Data Stories

School of Data mentor Marco Túlio Pires has been writing for our friends at Tactical Tech about journalistic data investigation. This post “talks us through how to begin approaching and thinking about stories in data”, and it was originally published on Exposing the Invisible‘s resources page.

Journalists used to carefully establish relationships with sources in the hope of getting a scoop or obtaining a juicy secret. While we still do that, we now have a new source which we interrogate for information: data. Datasets have become much like those real sources – someone (or something!) that holds the key to many secrets. And as we begin to treat datasets as sources, as if they were someone we’d like to interview, to ask meaningful and difficult questions to, they start to reveal their stories, and more often than not, we come across tales we weren’t even looking for.

But how do we do it? How can we find stories buried underneath a pile of raw data? That’s what this post will try to show you: the process of understanding your data and listening to what your “interviewee” is trying to tell you. And instead of giving you a lecture about the ins and outs of data analysis, we’ll walk you through an example.

Let’s take an example from the The Guardian, the British newspaper that has a very active data-driven operation. We’re going to (try to) “reverse engineer” one of their stories in the hopes you get a glimpse at what happens when you go after information that you have to compile, clean, and analyse and what kind of decisions we make along the way to tell a story out of a dataset.

So, let’s talk about immigration. Every year, the Department of Immigration and Border Protection of Australia publishes a bunch of documents about immigration statistics down under. Published last year, the team at The Guardian focused on a report called Asylum Trends for 2011-2012. There’s a more up-to-date version available (2012-2013). By the end of this exercise, we hope you can use the newer version to compare it with the dataset used by The Guardian. Let us know in the comments about your findings.

The article starts with a broad question: does Australia have a problem with refugees? That’s the underlying question that helps makes this story relevant. It’s useful to start a data-driven investigation with a question, something that bothers you, something that doesn’t seem quite right, something that might be an issue for a lot of people.

With that question in mind, I quickly found a table on page 2 with the total number of people seeking protection in Australia.

People seeking Austria's protection

Let’s make a chart out of this and see what the trend is. Because this is a pesky PDF file, you’ll need to either type the data by hand into your spreadsheet processor or use an app to do that for you. For a walkthrough of a tool that does this automatically, see the Tabula example here.

After putting the PDF into Tabula this is what we get (data was imported into OpenOffice Calc):


I opened the CSV file in OpenOffice Calc and edited it a bit to make it clearer. Let’s see how the number of people seeking Australia’s protection has changed over the years. Using the Chart feature in the spreadsheet, we can compare columns A and D by making a line chart.

Line chart

Take a good look at this chart. What’s happening here? On the vertical axis, we see the total number of people asking for Australia’s protection. On the horizontal axis, we see the timeline year by year. Between 2003 and 2008, there’s no significant change. But something happened from 2009 on. By the end of the series, it’s almost three times higher. Why? We don’t know yet. Let’s take a look at other data from the PDF and use Tabula to import it to our spreadsheet. Maybe that will show us what’s going on.

Australia divides their refugees into two groups: those who arrived by boat and those who arrived by air. They use the acronyms IMA and non-IMA (IMA stands for Irregular Maritime Arrivals). Let’s compare the totals of the two groups and see how they relate across the years presented in this report. Using Table 4 and Table 25, we’ll create a new table that has the totals for the two groups. Be careful, though, the non-IMA table goes back up to 2007, but the IMA table goes only as far as 2008. Let’s create a line chart with this data.

image 5

What’s that? It seems that in 2011-2012, for the first time in this time series, the number of refugees arriving in Australia by boat surpassed those landing by plane. The next question could be: where are all the IMA refugees coming from? We already have the data from table 25. Let’s make a chart out of that, considering the period 2011-2012. That would be columns A and E of our data. Here’s a donut chart with the information:

Donut chart

Afghanistan (deep blue) and Iran (orange) alone represent more than 64% of all IMA refugees in Australia in 2011-2012.

From here, there are a lot of routes we could take. We could use the report to take a look at the age of the refugees, like the folks at The Guardian did. We could compare IMA and non-IMA countries and see if there’s a stark difference and, if so, ask why that’s the case. We could look at why Afghans and Iranians are travelling by boat and not plane, and what risks they face as a result. How does the data in this report compare with the data from the more recent report? The analysis could be used to come up with a series of questions to ask to the Australian government or a specialist on immigration.

Whatever the case might be, it’s worth remembering that finding stories in data should never be an activity that ends in itself. We’re talking about data that’s built on the behavior of people, on the real world. The data is always connected to something out there, you just need to listen to what your spreadsheet is saying. What do you say? Got data?

flattr this!

Comparing Corruption Data: expedition in London on April 2nd, 2014

tingeber - March 26, 2014 in Data Expeditions, Follow the Money

School of Data is organizing a corruption data expedition in London next Tuesday, together with the engine room and CIVICUS World Alliance.

The expedition will involve teams wrangling, scraping and analyzing different kinds of corruption data collected by civil society (everything from citizen reports, to representative surveys to web-scraped data) and looking for ways to piece them together for better data-driven advocacy. We’ve got space for up to 25 people, and will be running on Wednesday April 2nd, 9:30AM until 4:30PM at C4CC, near King’s Cross. You can find more information in this concept note; for background on CIVICUS work to support citizen data for accountability, click here.

You can also register on this form.

If you’re in London next week and interested in anti-corruption campaigning or want to flex datawrangling muscles, sharpen dataskills and mash up for the greater good, join us!

flattr this!

Data Roundup, 26 March

Marco Menchinella - March 26, 2014 in Data Roundup

Eowyn_86 – Salvem La Balena / Save The Whale

Tools, Events, Courses

It is true that JSON is increasingly becoming the standard when it comes to API formats, but what if you need to convert it into a much more organized CSV file? Thanks to Eric Mill, now you can do it quickly with this tool. Read more about it in “Making JSON as simple as a spreadsheet”.

Data Stories

Dave Guarino tells us what ETL problems are and how we can succeed in solving them when we have to collect and re-organize datasets lost God-only-knows-where in the Web.

What is the difference between data storytelling and data narratives? Dino Citraro from Periscopic explains it in his short and brilliant post “A Framework for Talking About Data Narration”.

It has not even been a week since the launch of Nate Silver’s FiveThirtyEight, and it is already attracting criticism from influential experts in the field, both journalists and not. If you are interested in reading a roundup of these opinions, take a look at Alberto Cairo’s blog and at Mark Coddington’s article on the Nieman Journalism Lab.

The most honorable use of data collection and analysis is certainly that which helps people improve their life. Katie Fehrenbacher recently posted an article on Gigaom on how data can fight human trafficking around the world which absolutely deserves to be read.

Have you ever wondered how our brain stores information? Do you want to know what is the main task of your hippocampus? Find all the answers in this infographic published on the Daily Infographic.

What prevents the emergence of open data-driven businesses in emerging countries? Is it possible to fuel innovation in these states through the establishment of a new investment fund? According to Prasanna Las Das from the World Bank Data Blog, this is the right time to do it.

Data Sources

Public transportation means are not always easily accessible by anyone. There could be people with difficulties (disabled or injured persons) which do not allow them to move freely on a metro. Mappable recreated an interesting series of maps showing those metro stations in Hamburg, London, and New York which are considered to be wheelchair accessible.

The article also contains a link to Wheelmap, a web app developed to monitor and display wheelchair accessibility around the world.

There are always experts that give useful suggestions on how to deal with open data, but these ones actually come from the father of the Web, Tim Berners-Lee, and you may not to miss them. If you read this 5-star deployment scheme, you will find interesting insights on the costs and benefits of web data.

To celebrate the World Water Day, the US open data portal data.gov released a series of datasets related to American coasts, oceans, and lakes that you can browse in data.gov/ocean.


Thanks to @jalbertbowdenii @zararah @DataAtCU

flattr this!

Learning From Syria

Matt McNabb - March 24, 2014 in Data Journalism

Syrien Aleppo Ansari und Amria 28.03.2013 Ansari Alltag  Amria Frontline

Over the past few months we have had the pleasure to be engaged in what may be one of the most impactful and important technology projects to come out of the unfolding human tragedy in Syria. In partnership with our friends at Caerus, a Washington-based strategy consultancy which focuses on fragile and conflict states, we have had the opportunity to generate high-fidelity insights from within Aleppo on a neighborhood-by-neighborhood basis. We have seen the rise of the radicalized Islamic State of Iraq and Syria (ISIS) as they assumed control of the city’s eastern cusp to wild fluctuations in the price and availability of staple food items. No longer is the human tragedy a fuzzy buzz of punditry; the picture can be and increasingly is a clear one.

This is not just a story about Syria, however. It is a story about how technology can bend to the realities of war. And how with those advancements, episodic journalism can give way to sustained insights that can be used to enhance and measure efforts by the humanitarian community to take action in support of our common humanity.

Insights in Conflict-Affected Areas

Research in a conflict-affected context is always a challenge, especially where mapping is concerned. Increasingly, humanitarian organizations and research shops have adopted one of the over two dozen different Android- or iPhone-enabled mobile data collection tools for capturing data. And in many contexts, this approach makes perfect sense. Going mobile means instant digitization. It’s clean. It’s straightforward.

The problem is that it can also be dangerous. Trotting around with a mobile phone to collect data can make you a target, especially where persistent data collection may be concerned. The sequence that it forces on the user (one question after another) belies the more conversational approach often employed by trained researchers in these areas. This is why we used paper. Fold it up, move about, and the methods are simple and flexible.

The challenges don’t end once data has been collected, though. Traditional methods for producing findings tend toward static tomes: 80-to-90-page reports accessible only to the few privileged with the time to navigate them. By the time that dynamics in a community shift, its findings run cold. By providing Caerus the ability to interact with the data at both the analyst level through a detailed Map Explorer and at the automated dashboard level, we enabled the data to come alive through interaction. We believe this is the future of conflict monitoring.

The challenge emerges, however: what to make visible… what to share.

Open Data vs. Do No Harm

We believe in the virtues of open data: that data itself can serve as infrastructure for not only a vast array of public goods like healthcare and public safety, but also as a mechanism for catalyzing private growth in industry. Like other public goods, data can drive value well beyond the intended purposes of its original collectors. Our friends at the World Bank have increasingly led the way in showing this for international development, as UNOCHA has shown its merit in humanitarian response.

But the principles underlying open data have their limits in conflict-affected areas, where the contest of information can—and often does—emerge more frequently to support actions that may have a deleterious effect on citizens’ safety or well being. Perhaps obvious to say, data in war can be a dangerous thing.

This introduces a second principle which, we believe, trumps the first: the humanitarian principle of do no harm. Data can always be used for good or ill. One could say the same for really any public good, whether roads or the internet. But amidst an active conflict, the stakes become all the more serious, and the weighted balance for greater good appears differently.

As we have learned from our work, however, the Syrian regime has systematically targeted citizens standing in line at bakeries to buy bread for their families. Providing the locations of each bakery and its status across the city of Aleppo to any party without due care around its use would simply be reckless and may put the lives of innocent civilians at considerably higher risk. This would flatly violate the very essence of “do no harm”.


We struggle with this challenge, balancing between the imperative to do no harm and the virtues of transparency and openness. We are not alone in this struggle. One the eve of International Open Data Day, we have made the determination—in partnership with our software’s users at Caerus—to make available the data, moderating its distribution solely to align with best efforts to do no harm. Collectively, we have chosen not to apply restrictive, proprietary standards to this initiative in the belief that transparently providing not only the findings from this work but also the underlying data on which it relies to anyone wishing to use it is in alignment with these principles to do no harm. We believe that this is the most prudent way of balancing the competing principles.

But if nothing else comes out of our experience, it surely must be that more work is need to establish clear, understood, and shared humanitarian guidelines for supporting the open data movement even amidst an active conflict.

From Analysis to Participant

One severe limitation of our work in Syria, however, is that it ends at the water’s edge. First Mile Geo and the power of connecting local insights to cloud analytics need not be used purely to support the work of seasoned researchers like those at Caerus. Rather, the power of First Mile Geo is the ability to connect local insights to local action.

Early indications from Aleppo point to the use of geospatial and other forms of data visualizations by local civil society and the ad hoc governance structures that have emerged to provide services in opposition areas. As local councils and other civil society ventures appear, the international community has an opportunity to support a commensurate effort to help them become data-driven. When local organizations begin benefitting from the data they are capable of generating beyond its use for donors and others outside the country, it provides the opportunity not only for them to optimize scarce resources smartly but to form a longitudinal baseline crucial for benchmarking and prioritization by its partners. Filling this very gap of Software as a Service in the first mile is why First Mile Geo was formed.

Have a look at our project on Syria for more.

flattr this!