The Data Journalism Bootcamp at AUB Lebanon

Ali Rebaie - January 29, 2015 in Data Journalism, Events, Fellowship

Data love is spreading like never before. Unlike previous workshops we did in the MENA region, on the 18th of January 2015, we gave an intensive data journalism workshop at the American University of Beirut for four consecutive days in collaboration with Dr. Jad Melki, Director of media studiesilovedata program at AUB. The Data team at Data Aurora were really happy sharing this experience with students from different academic backgrounds, including media studies, engineering or business.

The workshop was mainly led by Ali Rebaie, a Senior School of Data fellow, and Bahia Halawi, a data scientist at Data Aurora, along with the data community team assistants; Zayna Ayyad, Noor Latif and Hsein Kassab. The aim of the workshop was to give the students an introduction to the world of open data and data journalism, in particular, through tutorials on open source tools and methods used in this field. Moreover, we wanted to put students on track regarding the use of data.AUBworkshop

On the first day, the students were introduced to data journalism, from a theoretical approach, in particular, the data pipeline which outlined the different phases in any data visualization project: find, get, verify, clean, analyze and present. After that, students were being technically involved in scraping and cleaning data using tools such as open refine and Tabula.

Day two was all about mapping, from mapping best practices to mapping formats and shapes. Students were first exposed to different types of maps and design styles that served the purpose of each map. Moreover, best mappings techniques and visualizations were emphasized to explain their relative serving purpose. Eventually, participants became able to differentiate between the dot maps and the choropleth maps as well as many others. Then they used twitter data that contained geolocations to contrast varying tweeting zones by placing these tweets at their origins on cartodb. Similarly, they created other maps using QGIS and Tilemill. The mapping exercises were really fun and students were very happy to create their own maps without a single line of code.

On the third day, Bahia gave a lecture on network analysis, some important mathematical notions needed for working with graphs as well as possible uses and case studies related to this field. Meanwhile, Ali was unveiling different open data portals to provide the students with more resources and data sets. After these topics were emphasized, a technical demonstration on the use of network analysis tool to analyze two topics wasworkshopaub performed. Students were analyzing climate change and later, the AUB media group on Facebook was also analyzed and we had its graph drawn. It was very cool to find out that one of the top influencers in that network was among the students taking the training. Students were also taught to do the same analysis for their own friends’ lists. Facebook data was being collected and the visualizations were being drawn in a network visualization tool.

After completing the interactive types of visualizations, the fourth day was about static ones, mainly, infographics. Each student had the chance to extract the information needed for an interesting topic to transform it into a visual piece.  Bahia was working around with students, teaching them how to refine the data so that it becomes simple and short, thus usable for building the infographic design. Later, Yousif, a senior creative designer at Data Aurora, trained the students on the use of Photoshop and illustrator, two of the tools commonly used by infographic designers. At the end of the session, each student submitted a well done infographic of which some are posted below.

After the workshop Zayna had small talks with the students to get their feedback and here she quoted some of their opinions:

“It should be a full course, the performance and content was good but at some point, some data journalism tools need to be more mature andStatic Infographics developed by the students at the workshop. user-friendly to reduce the time needed to create a story,” said Jad Melki, Director of media studies program at AUB, “it was great overall.”

“It’s really good but the technical parts need a lot of time. We learned about new apps. Mapping, definitely I will try to learn more about it,” said Carla Sertin, a media student.

“It was great we got introduced to new stuff. Mapping, I loved it and found it very useful for me,” said Ellen Francis, civil engineering student. “The workshop was a motivation for me to work more on this,” she added, “it would work as a one semester long course.”

Azza El Masri, a media student, is interested in doing MA in data journalism. “I like it I expected it to be a bit harder, I would prefer more advanced stuff in scraping,” she added.


flattr this!

Memories from San Jose

escueladedatos - January 29, 2015 in Data Expeditions

This article was originally posted in Spanish at Escuela de Datos by Phi Requiem, School of Data fellow in Mexico.

Last November, the Open Government Partnership (OGP) Summit took place in Latin America. CSO participants from 18 countries got together to share and exchange in an “unconference” where many topics were discussed. It was really interesting to learn about ways data things are handled in different countries, and to pinpoint the similarities and differences between our contexts.

Screen Shot 2015-01-13 at 16.48.14After a few words from the President of Costa Rica and other government representatives, a series of talks and roundtables began… And then, in parallel, Antonio (School of Data fellow in Peru) and I started a datathon.

In this datathon, our task was to give training and support to the five teams asking questions to the dataset on the commitments of the OGP countries, and which can be found here → Action Plan Commitments and IRM Data,,

The first step is to approach the data and structure it. After this, it was time to pose the questions we wanted to answer through the analysis of this data, and a lot of great questions (and interesting purposes) arose – many more than time allowed us to develop further. Teams picked the topics that seemed most relevant to them.

Screen Shot 2015-01-13 at 16.49.48Teams were already working on their analysis at 9 sharp the following morning, while OGP San Jose sessions were taking place. The datathon participants looked for more data, did cross-comparisons, scraping, etc. By noon, they had found results and answers – it was time to start working to present them in visualizations, infographics, maps, articles, etc. At 3PM, the teams impressed us with their presentations, and showed us the following outcomes:

  • Team Cero Riesgos: Generating information on risks by area. Data: OIJ, Poder Judicial.
  • Team Accesa: Comparing the perception of Latin American citizens on current topics in the LatinoBarometer with the commitments and achievements per country. The goal: to know if governments are responding to citizen concerns.
  • Team E’dawokka: Comparing the agendas and priorities of Central America with those in the rest of Latin America.
  • Team InfografiaFeliz: What countries look like in the Human Development Index in terms of their anti-corruption measures (and their success).
  • Team Bluffers: Measuring the percentage of delay and achievement of the commitments acquired by each country, and relating the design process for the commitments (measured by their relevance and potential impact) and their achievement.

At the end of the day, the jury chose teams InfografiaFeliz and Accesa as winners (which earned them a prize in cash).

Screen Shot 2015-01-13 at 16.51.43This was the first data expedition in Costa Rica, and you can find more in the following links:,, ,

What I take away from my experience in this expedition is that people are always willing to learn and create, but not everyone is aware of what open data is, or how it can be useful for them. Initiatives of this sort are achieving their mission, but are insufficient – and that’s why we need to keep in touch with the participants and encourage them to share their experiences, and, why not: to replicate these initiatives.

Here are some tips for people with an interest in running data expeditions:

  • It’s difficult to explain the difference between a hackathon and a data expedition… But, the earlier this is out of the way, the better.
  • There most be a conceptual baseline. With such limited time it’s difficult to give introductions or previous workshops, but trying to do a bit of this can be really useful.
  • Teams always have good ideas to handle information and show conclusions, but many times impose limitations on themselves because they think the technical barriers are huge. Having a hackpad or Drive folder with examples and lists of tools can help people overcome that fear.

flattr this!

Digital Methods Initiative Winter School, University of Amsterdam

Sam Leon - January 27, 2015 in Events

Exploding book in the pulpit of the De Krijtberg Church in Spui, Amsterdam where some of the sprint took place

Exploding book in the pulpit of the Algemene Doopsgezinde Sociëteit in Spui, Amsterdam where some of the sprint took place

Last week I attended the 7th annual Winter School at Amsterdam University. Run by the Digital Methods Initiative, it took the form of a data sprint in which students joined professional developers and designers to answer research questions using social media data.

The DMI group at Amsterdam have developed and collated a suite of easy-to-use tools specifically for this kind of research. They are well worth checking out for anyone interested in this field and they cover a range of techniques from web scraping to list triangulation, and can be found online here.

I joined a group looking at bias across three APIs through which you can acquire Twitter data: the Search API, the Stream API and the proprietary Firehose endpoint – generally regarded as the most complete source of Twitter data. We had three sets captured from the three separate APIs for a critical period between 7th and 15th October 2014 when the Hong Kong protests were taking place.

Other groups took on a range of tasks from mapping the open data revolution to tracking the global climate change debate. All projects deployed a range of data wrangling techniques to answer these complex social, political and cultural phenomena.

A few things I learned:

  • Anyone wanting to use social media data to answer research questions about society and culture needs more than just spreadsheet skills. These datasets are generally larger than what Excel can comfortably handle, so basic database skills are a massive help.
  • Off-the-shelf tools for data analysis are brilliant, but often one needs to tweak lines of enquiry to your specific research question. Having some knowledge of programming means that you can take a much more flexible approach then when relying on the GUI tools.
  • Working in such a collaborative fast-paced environment meant that reproducibility (ie. where different parts of the team would re-use scripts and code developed by other parts of the team) was essential, alongside creating documentation on the fly. We found iPython notebooks especially useful for this, whereas analytical steps taken in Excel were harder to reproduce.
  • Free Twitter data – like that which can be acquired from the Search and Stream API – is still good, and sometimes better than that which you get through the proprietary APIs. When investigating online reactions to contentious and controversial events – such as the Hong Kong protests – tweets will inevitably be removed both by users and Twitter. If you want to get the full story, it’s far better to scape data as it comes in through the streaming API.
  • We’ve written about it before on this blog but the Pandas module for Python is brilliant for data wrangling and analysis and well worth getting to know if you plan on working with big datasets. It’s quick, flexible and powerful.
  • Nothing beats hands-on learning when it comes to technical skills. Having a motivating research question and some real life data is the best way to learn how to use the multitude of tools now at any budding data wranglers disposal. I learnt more in a week than I could have in months reading about tools and languages in the abstract!

For those interested in attending a DMI school in the future – take a look at the summer school coming later in 2015.

flattr this!

Data literacy needs within the Follow the Money network

Zara Rahman - January 26, 2015 in Events, Follow the Money

Last week, I joined a meeting hosted by the Transparency and Accountability Initiative around ‘Follow the Money’. It brought together people working on various aspects of the money trail, from access to information, to developers, investigative journalists, campaigners and activists, to think about how we can better collaborate in the future, and where the gaps are in the network.

Data Pipeline

I had the pleasure of running a couple of School of Data related sessions, too – one short skillshare running through the ‘data pipeline’, and a longer session building out a ‘follow the money’ focused data pipeline, focused mainly on gathering various data sources on topics in this field. The pipeline, in its rough format, is online here, and I’ll publish it in a more accessible format on the School of Data site soon too.

The value of asking questions

These sessions made me think about how data literacy skills could be developed within this community, and what is really needed to support and further the work of Follow the Money initiatives. Pragmatically speaking, for technology and data to be engaged and used successfully to further people’s work, not everyone in that room needs to be a superstar data wrangler or developer. What they do need, though, is to know where the people with technical expertise are, and to be able to ask them for assistance.

In the ‘thanks’ at the end of the workshop, lots of us mentioned that being in a space where, as our facilitator Allen Gunn said, ‘asking a question is considered to be a heroic act of leadership’ rather than a signal of a lack of knowledge. It was obvious that we valued most the patience and understanding of those around us who have higher levels of knowledge in a certain field, be that topical expertise, or technical; and that for many, the opportunity to ask these technical questions comes far too rarely.

This made me think about the value of the School of Data community – in my follow up emails from the workshop, I’ve been connecting people from various countries and contexts to former fellows who are based near them, or people running local groups in neighbouring countries, who can help them in person as well as online with their data-related queries. From past experience of seeing how well our data trainers and community members work with civil society groups with lower levels of data literacy, I’m optimistic that this will work out well – whether it be simply exchanging a few emails, or working with the community members or us at School of Data central to commission actual in person trainings.

Data wrangling + topical expertise = effective data-driven campaigning

As I mentioned, these connections provide a somewhat pragmatic solution to a need for better use of data among the community. Ideally, however, we would have people based within these organisations for long term support, who have both topical expertise and data wrangling skills.

And from what I heard, the need for this skillset will become extremely pronounced in the coming years; various directives and new laws regarding data availability and transparency sitting at different points of the money trail will be coming into force over the next couple of years, and they will bring with them a deluge of data. For example, data on extractives following Section 1504 of the Dodd Frank Reform, and company data following the EU Accounting and Transparency directives. What stories lie within that data, and how can we uncover them?

Many of the people and organisations represented at the Follow the Money workshop have been instrumental in campaigning for those transparency directives; but how many of those organisations possess in-house ability to actually process and use that data? Effectively, the next round of campaigning should be based on stories that come out of that hard-fought for data – but for that to happen, we need to start preparing now, by building data and technical skills among our communities.

Laying the groundwork for data storytelling

So, how can we start doing this? It could be through providing support for current employees of organisations to attend data expeditions or data skills courses on an ongoing basis; not just one off workshops, but people learning skills that are clearly relevant to their work, and having regular refresher courses to keep it relevant and in their minds. Or, (apologies for the blatant self-promotion here!) – it could be through supporting topical School of Data fellows to be based within the community and provide ongoing support, focusing on a specific topic – like extractives, or corporate money flows, for example.

Our experiences from the 2014 fellowships have led us to believe that the fellowship scheme is a sustainable and successful method of building up capacity both in terms of finding and supporting data storytellers and trainers (the Fellows), and equipping them with the skills they need to provide ongoing support to organisations based in their area, with whom they share their skills. Last year, the fellows carried out activities ranging from regular workshops with local organisations, to data clinics and expeditions for newcomers to get hands on with data, to simply being present within organisations as in-house support.

From what I saw last week, a lot of organisations within the Follow the Money network could do with this support. The earlier we start developing this capacity, the better equipped we will be as a community to start delving into the avalanche of data that is soon to come our way.

If you want to find out more about the Fellowship scheme, see the section ‘Fellowship Programme’ on our 2014 Annual Report, and if you’d like to talk about supporting a fellow through our upcoming 2015 scheme, get in touch with me on zara.rahman [at]

flattr this!

Be Smart with Spreadsheets

Rita Zagoni - January 24, 2015 in Community, Events

We have organized three spreadsheet workshops in Budapest in the course of September and December 2014 as part of the Be Smart with Data project (site in Hungarian). The initiative aims at promoting the use of open data and building data skills among NGOs and journalists in Hungary to make them more effective in their advocacy work, as well as in assessing their impact and communicating their results. The participating organizations work in a variety of fields from international development through media monitoring and human rights to lobbying for transparency of local governments, and the project has an added focus on monitoring public spending in these fields.



The first trainings took place on September 9 and 11. Before the events, we sent out a questionnaire to get an idea of what skills participants already have and what they are interested in learning. It turned out from the responses that tools for data analysis and visualization would be the most useful for their daily work, so we decided to cover formatting tricks, functions, pivot tables and visualization with charts. To make the work more effective, we separated a beginner and an advanced group, and customized the material to the different skill levels, covering basic formatting and functions in the former and advanced formatting, pivot tables and charts in the latter group. We have worked with real world data on school performance and on migration, published by the Office of Education and the Hungarian Central Statistics Office.



A slight piquancy is added to the events by the fact that we were holding the workshops during the hours when the Hungarian police saw fit to raid the offices of NGOs distributing Norway Grants.



flattr this!

Talking about Open Data in Digital Democracy Meetup Indonesia

yuandra - January 19, 2015 in Community, Events

In the middle of December, the Indonesian Digital Democracy Forum organized a large meetup for Indonesian digital activists who are involved in the movement for pushing democracy in Indonesia via digital media. The meetup was a two day event that was attended by about 30-50 digital activists, from various backgrounds. The meetup was organized as a kind of mini-conference, where there were several breakout rooms, each with different sessions focusing on specific digital democracy themes, such as open data, or internet freedom.

On the theme of open data, there was a very interesting discussion on how the open data movement can help in strengthening digital democracy in Indonesia. One of the examples shown is the story of KawalPemilu , a platform for voter count verification which uses crowdsourcing; it was created by just five poeple, and yet played a pivotal role in Indonesia’s 2014 presidential election as a checking mechanism for the voter count. The platform allows citizens of Indonesia to see and verify the voter count of the election and to check if anything is amiss. It stood as a strong example of citizen participation in the Indonesian elections.

Regarding the state of open data more generally in Indonesia, there was an overall acknowledgement that the movement is still in early stages. There is still a lot of need to raise awareness regarding the importance and usage of it, as well as skills around how to actually work with the data. Also notable in the discussion is the importance of collaboration between the various open data actors from both the supply and demand side such as government, CSO, and citizens.

Meetups like these provide a great platform for these actors in the open data space to actually connect and collaborate with each other – having these more frequently would strengthen the movement as a whole.

flattr this!

School of Data in 2014: Annual Report

Zara Rahman - January 14, 2015 in Review

We’re very happy to share with you today our 2014 Report, which you can see online here:

It includes a run through of our major activities from 2014: our very first Summer Camp, held in Berlin in July 2014; the 2014 fellowship scheme, which saw 12 data training leaders from across the world join us as our Class of 2014 School of Data fellows; highlights from their fellowship and activities carried out; our work supporting advocacy organisations directly; and the progress made on our online and offline training materials, to name just a few highlights.

It was inspiring to see how much our community got done during the last year – and sadly impossible to include everything that happened in the report, but hopefully this gives a good taster.

About School of Data

School of Data’s mission is to empower the citizens and organisations who wish to use data ‘for good’, with the skills they need. We work with civil society organisations and journalists, teaching them to use data to find evidence, create compelling visualisations and tell stories to present their arguments in a more effective way.

We believe in “learning by doing” and using fun, hands on methods to engage in data storytelling. Together with our global community, we have developed online training materials in multiple languages, and carried out offline trainings in partnership with local organisation in more than 30 countries worldwide, as well as provided support and advice to organisations wanting to become more data-driven.


If you are interested in supporting the upcoming 2015 Fellowship scheme, or knowing more about the Fellows (both current and future) – get in touch with Zara Rahman on zara.rahman[at]

If you would like to know more about other School of Data activities, including trainings and events, collaboration opportunities and organisational support please email Programme Director Milena Marin on milena.marin[at]

flattr this!

Two-day data training in Macedonia

dona - January 5, 2015 in Data for CSOs, Skillhare

At the end of November (26th-28th), in Dojran – a city in Macedonia, we held a two-day training with Milena. Tailored to the needs of the 24 participants from different CSOs, we tried to cover as much as possible of the topics we narrowed down using the form we composed for determining their skills and needs.



We started the training with the basic introduction to what data is and where to find it, and the first day we mostly focused on working with spreadsheets, formulas and pivot tables. The next day we shared some thoughts and skills on data visualization, worked with different online data visualization and mapping tools and talked about creating beautiful timelines. Anyhow, the agenda for the training is here for everyone to check, use and adapt.


During the two days we tried to be as flexible as possible and adapt to the real time needs of the participants, and to engage everyone in a more interactive way of learning through practical exercises and teamwork.

Here you can see some more photos from the training.

flattr this!

Breaking Borders: The #OpenData Party in Accra Ghana

olubabayemi - December 31, 2014 in Events

In the last series of our advocacy on Open Data through capacity building, we finally had a data clinic session at the Asa Royal Hotel in Accra, Ghana on Tuesday, December 9, 2014 a coincidence with the International Anticorruption Day, and CSOs in Accra Ghana weren’t left out. Why did we try taking this gospel to Ghana? We had enjoyed close collaboration and relationship with start ups and NGOs in Ghana, and for them, one of the drawbacks in finding data is the unavailability of a freedom of information act, or the access to information act.

Just like we have seen in Nigeria, NGOs and activist seem not familiar with data pipelines or what we refer to as the data management processes, likewise basic tools that can be used in analyzing data. Unlike Nigeria, the transparency and accountability [T&A] movement in Ghana is coordinated under the STARGHANA project. Thus creating an ecosystem of groups working in the T&A component of the Open Data movement. “Two years ago, I was part of a team that initiated the SMS reporting on service delivery in the health sector, however, I am not sure how much the system is working anymore” explained Joseph Senyo, National Director of Programmes, Community and Family Aid Foundation

Open Data Party in Accra Ghana

Participants at the Open Data event in Accra Ghana

While going through finding data, it was interesting to know that Nigeria has more datasets available online than Ghana, as most of the participants couldn’t figure out where to find the budget data of the country, although some mentioned the ministry of finance, but surprisingly we couldn’t get budget data from this website. Nevertheless, the country national statistics online portal is a one – stop shop for datasets in the country, and only one of the participants knew this existed. Analyzing using Microsoft Excel, and Google Spreadsheets was an eye – opener for participants, as most of them requested to know how this can be applicable in their various works.

While it was important to drive this conversation forward, outside the training sessions, the participants were already thinking about a 3 –day event that could bring together government, NGOs and other activist in the coming year. But, our trip to Accra would not have been complete without taking some time at the iSpace (it was a women in technology day, and we had ladies) and the LaBadi Beach – it is known that trainings can also be complemented with ice breakers on the beach – and same we did, and fortunately for us – it was the reggae night.

Getting  instant feedbacks from participants

Getting instant feedbacks from participants

“We would have like to have more days of training, as the little minutes I spent was quite educative, especially the use of analysis tools, thus making me to know how important data is to my various monitoring and evaluation work” said Mensah Ileom of Inspire Africa. Actually, I have seen more NGO participants looking towards how data gathering can also help them in monitoring and evaluation, aside using it for advocacy, and monitoring service delivery.


flattr this!

Mobile data collection

Joachim Mangilima - December 16, 2014 in Skillhare, Tech

This blog post is based on the School of Data skillshare I hosted on mobile data collection. Thanks to everyone who took part in it!

Of recent, mobile has become an increasingly popular method of data collection. This is achieved through having an application or electronic form on a mobile device such as a smartphone or a tablet. These devices offer innovative ways to gather data regardless of time and location of the respondent.

The benefits of mobile data collection are obvious, such as quicker response times and the possibility to reach previously hard-to-reach target groups. In this blog post I share some of the tools that I have been using and developing applications on top of for the past five years.

  1.       Open Data Kit

Open Data Kit (ODK) is a free and open-source set of tools which help researchers author, field, and manage mobile data collection solutions. ODK provides an out-of-the-box solution for users to:

  • Build a data collection form or survey ;
  • Collect the data on a mobile device and send it to a server; and
  • Aggregate the collected data on a server and extract it in useful formats.

ODK allows data collection using mobile devices and data submission to an online server, even without an Internet connection or mobile carrier service at the time of data collection.


Screen Shot 2014-12-15 at 20.15.30

ODK, which uses the Android platform, supports a wide variety of questions in the electronic forms such as text, number, location, audio, video, image and barcodes.

  1.      Commcare

Commcare is an open-source mobile platform designed for data collection, client management, decision support, and behavior change communication. Commcare consists of two main technology components: Commcare Mobile and CommCareHQ.

The mobile application is used by client-facing community health workers/enumerator in visits as a data collection and educational tool and includes optional audio, image, and audio, GPS locations and video prompts. Users access the application-building platform through the website CommCareHQ  which is operated on a cloud-based server.

Screen Shot 2014-12-15 at 20.20.30

Commcare supports J2ME feature phones, Android phones, and Android tablets and can capture photos and GPS readings, Commcare supports multi-languages and non-roman character scripts as well as the integration of multimedia (image, audio, and video).

CommCare mobile versions allow applications to run offline and collected data can be transmitted to CommCareHQ when wireless (GPRS) or Internet (WI-FI) connectivity becomes available.

  1.      GEOODK

GeoODK provides a way to collect and store geo-referenced information, along with a suite of tools to visualize, analyze and manipulate ground data for specific needs. It enables an understanding of the data for decision-making, research, business, disaster management, agriculture and more.

It is based on the Open Data Kit (ODK), but has been extended with offline/online mapping functionalities, the ability to have custom map layer, as well as new spatial widgets, for collecting point, polygon and GPS tracing functionality.

Screen Shot 2014-12-15 at 20.21.48

This one blog post cannot cover each and every tool for mobile data collection, but some other tools that can be used to accomplish  mobile data collection each of which having their own unique features includes OpenXData and Episurveyor.

Why Use Mobile Technology in Collecting Data

There are several advantages as to why mobile technology should be used in collecting data some of which include,

  •         harder skipping questions,
  •         immediate (real time) access to the data from the server, which also makes data aggregation and analysis to become very rapid,
  •         Minimizes workforce and hence reduces cost of data collection by cutting out data entry personnel.
  •         Data Security is enhanced through data encryption
  •         Collect unlimited data types such as audio, video, barcodes, GPS locations
  •         Increase productivity by skipping data entry middle man

·         Save cost related to printing, storage and management of documents associated with paper based data collection.

flattr this!