Learn data visualization and data-driven journalism in a real “Data Jam”

Happy Feraren - February 17, 2015 in Events

[Cross posted from GMA News Online; Press release by Banthay.ph]

On February 21 (International Open Data Day), Bantay.ph, a platform that uses technology in mobilizing citizens to demand good governance, will host the very first citizen-initiated “Data Jam.”

Done in partnership with the Southeast Asia Technology and Transparency Initiative (SEATTI), the Data Jam aims to get citizens to participate in governance via data analysis and visualization.
The event also aims to teach the general public and journalists alike the fundamentals of data visualization and data-driven journalism through a real hands-on experience.
“Data Journalism in the Philippines is still a wide-open field,” notes TJ Dimacali, Philippine Cyberpress president. “It’s an exciting frontier, especially for tech-savvy journalists. But it’s also something anyone can do, given the right tools.”
Bantay.ph co-founder and 2014 School of Data Fellow Happy Feraren explains: “The information that we hope to mine from the activity can give us an insight on how and where exactly our systems of governance are failing. It can help us identify what exactly is going wrong and instead of pointing fingers,  we can use this information to improve the lapses of the bureaucracy.”
The Data Jam hopes to introduce the frontier of using data to raise awareness and give feedback to government. Feraren adds, “We will group writers, graphic designers, and data analysts together to come up with questions and find the answers together.”
The program and activity flow will be based on the international School of Data toolkit. Using the open datasets of Bantay.ph and the Civil Service Commission, the event wants to get people with the right skillsets to work together and discover new stories from the raw datasets provided. Overall, it’s a new way to shed light on national issues and is a slicker and more efficient way to give feedback to government.
The Philippines has been a signatory of the global Open Government Partnership (OGP) since 2011, which essentially encourages participatory governance and openness in the bureaucracy. One way that the OGP suggests is the use and application of open data provisions. Given the amount of public data, there should be a conscious effort to make these datasets available and easily accessible. And at the same time, citizens should make use of these datasets to ensure transparency is met.
“We want to promote that kind of culture where we make data-driven decisions, especially when it comes to matters of governance. There is so much we can do to track what government is doing and how they are performing. It’s one concrete way to tell them, as citizens, that ‘we are watching you,’ ” says Feraren. “It’s one way we can promote a culture of active citizenship – where we don’t just rely on mainstream media to know what’s really happening. There’s a whole lot of data out there that we don’t look at and given the right training and awareness, citizens CAN mine their own insights out of publicly available data.”
The Data Jam is organized by Bantay.ph and SEATTI, co-sponsored by the Open Knowledge Foundation, The School of Data – Philippines, and the Philippine Cyberpress.
Interested data analysts, storytellers, and graphic designers can RSVP via info@bantay.ph – LIMITED SLOTS ONLY and RSVP is a must. It will be held in on February 21, 1-5pm at the AIM Conference Center. Full details will be sent to confirmed participants.

flattr this!

Highlights from the Ask Us Anything hangouts (part I)

Zara Rahman - February 17, 2015 in Fellowship

We carried out a couple of video hangouts with 2014 fellows to talk more about last year’s fellowship programme, and about the upcoming programme which has an open Call for Applications, closing on March 10th. For those who prefer reading to watching, here are some highlights and questions that came up during the video hangouts!

Question: What is the typical day of a School of Data fellow?

Happy: As a fellow, I spent a lot of time learning! The Fellowship really helped me to be brave and dive into the data… other than the events you have to do, a lot of it is a learning experience. It really never stops!

Yuandra: Usually, I meet with a lot of people – working with data is very new in my country, Indonesia, so there was lots of interest. I spent lots of time going from organisation to organisation, raising awareness of what they can do with data. Then, planning training – the materials, preparing them, thinking about how to package the materials in a way that people will understand.

Question: What skills are needed to be a School of Data fellow?

Milena: We’re looking for a diversity of skills among the fellows, we’re hoping each fellow will have a strong skill that they’ll be able to teach others, as well as be able to identify gaps in their own knowledge. We only have 7 spaces this year, which is fewer than last year, so it will (hopefully!) be a competitive process.

Codrina: It’s important to have some connections in your region, because the Fellowship (and School of Data) is not just about learning things for yourself, but then to take what you have learned and what you know, and spread it in your own geographical context. Or if you don’t already – be prepared to go around and meet lots of new organisations and build the community around you!

Yuandra: Community building is really important, you’ll be working with other organisations around you who definitely have the need for data. So is communication: my background is very technical, but this Fellowship taught me how to put my technical jargon aside, and explain issues in a simple way for newcomers to the topic.

Question: What kinds of projects did Fellows carry out?

Yuandra: I worked with Publish What You Pay (who work on extractive industries transparency), who previously only used data in Excel, and for reports. When I went there, one of my main points was to show them how they can use data in other ways, for example in visualisations and infographics. They’re still in an early stage of working with data, but they’ve come a long way!

Codrina: I’m a mapping person, so much of the work I did involved either building maps or teaching people how to use them, and how to stay away from usual map problems. I went to Bosnia & Herzegovina, and worked on election maps. If you’re ever curious about the most horrible election system in the world – take a look! We spent a week trying to work out how it works, we ended up asking people to explain the system in a 3 minute video, which worked really well.

Happy: I found that it’s hard to ‘sell’ open data to different CSOs just by explaining – so, I wanted to use my own organisation as a model, to demonstrate what exactly people can do with open data. It was a really good way actually for us to engage with government – you build trust, and partnerships with them, by teaching them what they can do with data. Now, the government are opening up datasets that they’ve never opened before – so this is really exciting for me.

Nisha: We did a data journalism workshop for people who are really not very technologically savvy – it was really rewarding because after a while of working with people who want to know more advanced stuff, you can forget there’s lots of people who still want to know the basics, so you get to open this whole new world to them. We also did a data expedition with an organisation that’s working in the urban space in Hyderabad, with data that they’d collected.

If you like the sound of what last year’s fellows got up to – why not apply yourself and join us as one of the 2015 Fellows? More details are available here, and if you have any further questions please drop us a line on info[at]schoolofdata.org or on @SchoolofData. Applications close on March 10th, and we look forward to hearing from you!

flattr this!

Ask Us Anything – watch it online now

Zara Rahman - February 16, 2015 in Fellowship

To talk through the fellowship programme and hear from last year’s fellows, we held a couple of online hangouts: you can watch them here, and if you have any further questions, feel free to drop us a line on info@schoolofdata.org, or tweet us @SchoolOfData

On Monday 16th February, our 2014 Fellows Codrina, Happy and Yuandra, from Romania, the Philippines, and Indonesia respectively, joined myself and Milena to talk through their experiences in last year’s fellowship.

Here’s the video online (just under an hour long):

And on Tuesday 17th February, Olu and Nisha, from Nigeria and India respectively, joined us to discuss their fellowship. Here’s their video, which is just over 30 minutes long:

flattr this!

It’s time to get data-savvy: host a School of Data fellow in 2015!

Zara Rahman - February 10, 2015 in Fellowship

We’re looking for local NGOs based in countries classified as low income, lower-middle income or upper-middle income to host our School of Data fellows.

evidence_power

Apply here

We have funding for 7 School of Data fellows to take part in our 2015 Fellowship Programme, and from previous experience, we’ve found that the fellowships work best when there is an established local host.

Who are we looking for?

School of Data is promoting data literacy by working with local partners to create impactful data-driven projects. We’re looking organisations that need support in using data more effectively and that are willing to work closely with one of our School of Data fellows over a 9 month period.

If you are selected, you’ll welcome a School of Data fellow in your office on a regular basis, to work on concrete projects and provide you with custom trainings and support, depending on what you need most. You’ll open up your data to the fellow, and allow them to see how you work with data now, help you guide your organisation towards being more data-savvy and using data to strengthen your work, be that in the field of advocacy, campaigning, journalism, or elsewhere within the civil society space. You’ll support the growth of the data-literate community, by inviting those within your network to attend trainings, and organising your own data expeditions, supported closely by the School of Data fellow.

What we expect you to contribute

This programme involves a great deal of resources and commitment from us, and we expect an equal amount of resources and commitment from our partners.

The ideal partner would be able to commit:

  • To support the fellow’s work, objectives and their overall work with your organisation without overburdening them or putting them in difficult situations
  • A good data driven project idea for what you want to achieve together with the fellow. This could be a specific data driven application (a web application, a website, an addition to an existing project or site, a mobile app), broader organisational support to use data, or any other feasible use for open data. The project must hold the potential to engage a large audience, to create a positive change for a community, region or country, and directly promote your organisation goals and objectives.
  • A team to the project. We want to create sustainable projects, and work with you to achieve systemic change within your organisation. We can’t do this with only one person. We would like to work with the relevant team of people in your organisation, depending on your needs and capacity.
  • In addition, we also welcome in kind or financial support for our fellowship programme. Our programme funds the work on the fellow, including a part time equivalent monthly stipend and some travel support but we appreciate additional support that can complement our programme. Get in touch to understand more about the type of support you can provide.

What you’ll get in return

If you are accepted as our local partner, we’ll ask for your assistance in selecting the best applicant to be the School of Data fellow who will work with you. The fellow will support you by:

  • Evaluating your organisational capacity to work with data
  • Delivering custom training and support for your organisation depending on your needs
  • Working with you on a concrete data driven project,

Here is just an example of what our 2014 fellow Hannah Williams worked on together with local partners from South Africa: http://capetownbudgetproject.org.za/

Interested? Get in touch.

Apply here

Deadline: March 10th

You are also welcome to contact us on info@schoolofdata.org while you are preparing your application; we’d be happy to answer your questions and help you put together a good application.

flattr this!

Call for Applications: School of Data 2015 Fellowship programme now open!

Zara Rahman - February 10, 2015 in Fellowship

We’re very happy to open today our 2015 Call for School of Data Fellowships!

Apply here

IMG_6400

Following our successful 2014 School of Data Fellowships, we’re opening today our Call for Applications for the 2015 Fellowship programme. As with last year’s programme, we’re looking to find new data trainers to spread data skills around the world.

As a School of Data fellow, you will receive data and leadership training, as well as coaching to organise events and build your community in your country or region. You will also be part of a growing global network of School of Data practitioners, benefiting from the network effects of sharing resources and knowledge and contributing to our understanding about how best to localise our training efforts.

As a fellow, you’ll be part of a nine-month training programme where you’ll work with us for an average of ten working days a month, including attending online and offline trainings, organising events, and being an active member of the thriving School of Data community.

Get the details

Our 2015 fellowship programme will run from April-December 2015. We’re asking for 10 days a month of your time – consider it to be a part time role, and your time will be remunerated. To apply, you need to be living in a country classified as lower income, lower-middle income or upper-middle income categories as classified here.

Who are we looking for?

People who fit the following profile:

  • Data savvy: has experience working with data and a passion for teaching data skills.
  • Social change: understands and interested in the role of Non-Governmental Organizations (NGOs) and the media in bringing positive change through advocacy, campaigns, and storytelling.
  • Has some facilitation skills and enjoys community-building (both online and offline) – or, eager to learn and develop their communication and presentation skills
  • Eager to learn from and be connected with an international community of data enthusiasts
  • Language: a strong knowledge of English – this is necessary in order to communicate with other fellows, to take part in the English-run online skillshares and the offline Summer Camp

To give you an idea of who we’re looking for, check out the profiles of our 2014 fellows – we welcome people from a diverse range of backgrounds, too, so people with new skillsets and ranges of experience are encouraged to apply.

This year, we’d love to work with people with a particular topical focus, especially those interest in working with extractive industries data, financial data, or aid data.

There are 7 fellowship positions open for the April to December 2015 School of Data training programme.

Geographical focus

We’re looking for people based in low-, lower-middle, and upper-middle income countries as classified by the World Bank, and we have funding for Fellows in the following geographic regions:

  • One fellow from Macedonia
  • One fellow from Central America – focus countries Costa Rica, Guatemala, Honduras, Nicaragua
  • One fellow from South America – focus countries Bolivia, Peru, Ecuador
  • Two fellows based in African countries (ie. two different countries)
  • Two fellows based in Asian countries (ie. two different countries)

What does the fellowship include?

As a School of Data fellow, you’ll be part of our 9-month programme, which includes the following activities:

  • guided and independent online and offline skillshares and trainings, aimed to develop data and leadership skills,
  • individual mentoring and coaching;
  • an appropriate stipend equivalent to a part time role;
  • Participation in the annual School of Data Summer Camp, which will take place in May 2015 – location to be confirmed.
  • Participation in activities within a growing community of School of Data practitioners to ensure continuous exchange of resources, knowledge and best practices;
  • Training and coaching of the fellow in participatory event management, storytelling, public speaking, impact assessment etc;
  • Opportunities for paid work – often training opportunities arise in the countries where the fellows are based.
  • Potential work with one or more local civil society organisations to develop data driven campaigns and research.

What did last year’s fellows have to say?

Check out the Testimonials page to see what the 2014 Fellows said about the programme, or watch our Summer Camp video to meet some of the community.

Support

This year’s fellowships will be supported by the Partnership for Open Development (POD) OD4D, Hivos, and the Foreign and Commonwealth Office in Macedonia. We welcome more donors to contribute to this year’s fellowship programme! If you are a donor and are interested in this, please email us at info@schoolofdata.org.

Got questions? See more about the Fellowship Programme here and have a looks at this Frequently Asked Questions (FAQ) page.- or, watch the Ask Us Anything Hangouts that we held in mid-February to take your questions and chat more about the fellowship.

Not sure if you fit the profile? Have a look at our 2013 and 2014 fellows profiles.. Women and other minorities are encouraged to apply.

Convinced? Apply now to become a School of data fellow. The application will be open until March 10th and the programme will start in April 2015.

flattr this!

Data expedition tutorial: UK and US video game magazines

Cédric Lombion - February 3, 2015 in Data Cleaning, HowTo, Spreadsheets, Storytelling, Workshop Methods

Data Pipeline

This article is part tutorial, part demonstration of the process I go through to complete a data expedition alone, or as a participant during a School of Data event. Each of the following steps will be detailed: Find, Get, Verify, Clean, Explore, Analyze, Visualize, Publish

Depending on your data, your source or your tools, the order in which you will be going through these steps might be different. But the process is globally the same.


FIND

A data expedition can start from a question (e.g. how polluted are european cities?) or a data set that you want to explore. In this case, I had a question: Has the dynamic of the physical video game magazine market been declining in the past few years ? I have been studying the video game industry for the past few weeks and this is one the many questions that I set myself to answer. Obviously, I thought about many more questions, but it’s generally better to start focused and expand your scope at a later stage of the data expedition.

A search returned Wikipedia as the most comprehensive resource about video game magazines. They even have some contextual info, which will be useful later (context is essential in data analysis).

Screenshot of the Wikipedia table about video game magazines https://en.wikipedia.org/wiki/List_of_video_game_magazines

GET

The wikipedia data is formatted as a table. Great! Scraping it is as simple as using the importHTML function in Google spreadsheet. I could copy/paste the table, but that would be cumbersome with a big table and the result would have some minor formatting issues. LibreOffice and Excel have similar (but less seamless) web import features.

importHTML asks for 3 variables: the link to the page, the formatting of the data (table or list), and the rank of the table (or the list) in the page. If no rank is indicated, as seen below, it will grab the first one.

Once I got the table, I do two things to help me work quicker:

  • I change the font and cell size to the minimum so I can see more at once
  • I copy everything, then go to Edit→Paste Special→Paste values only. This way, the table is not linked to importHTML anymore, and I can edit it at will.

VERIFY

So, will this data really answer my question completely? I do have the basic data (name, founding data, closure date), but is it comprehensive? A double check with the French wikipedia page about video game magazines reveals that many French magazines are missing from the English list. Most of the magazines represented are from the US and the UK, and probably only the most famous. I will have to take this into account going forward.

CLEAN

Editing your raw data directly is never a good idea. A good practice is to work on a copy or in a nondestructive way – that way, if you make a mistake and you’re not sure where, or want to go back and compare to the original later, it’s much easier. Because I want to keep only the US and UK magazines, I’m going to:

  • rename the original sheet as “Raw Data”
  • make a copy of the sheet and name it “Clean Data”
  • order alphabetically the Clean Data sheet according to the “Country” column
  • delete all the lines corresponding to non-UK or US countries.

Making a copy of your data is important

Tip: to avoid moving your column headers when ordering the data, go to Display→Freeze lines→Freeze 1 line.

Ordering the data to clean it

Some other minor adjustments have to be made, but they’re light enough that I don’t need to use a specialized cleaning tool like Open Refine. Those include:

  • Splitting the lines where 2 countries are listed (e.g. PC Gamer becomes PC Gamer UK and PC Gamer US)
  • Delete the ref column, which adds no information
  • Delete one line where the founding data is missing

EXPLORE

I call “explore” the phase where I start thinking about all the different ways my cleaned data could answer my initial question[1]. Your data story will become much more interesting if you attack the question from several angles.

There are several things that you could look for in your data:

  • Interesting Factoids
  • Changes over time
  • Personal experiences
  • Surprising interactions
  • Revealing comparisons

So what can I do? I can:

  • display the number of magazines in existence for each year, which will show me if there is a decline or not (changes over time)
  • look at the number of magazines created per year, to see if the market is still dynamic (changes over time)

For the purpose of this tutorial, I will focus on the second one, looking at the number of magazines created per year Another tutorial will be dedicated to the first, because it requires a more complex approach due to the formatting of our data.

At this point, I have a lot of other ideas: Can I determine which year produced the most enduring magazines (surprising interactions)? Will there be anything to see if I bring in video game website data for comparison (revealing comparisons)? Which magazines have lasted the longest (interesting factoid)? This is outside of the scope of this tutorial, but those are definitely questions worth exploring. It’s still important to stay focused, but writing them down for later analysis is a good idea.

ANALYSE

Analysing is about applying statistical techniques to the data and question the (usually visual) results.

The quickest way to answer our question “How many magazines have been created each year?” is by using a pivot table.

  1. Select the part of the data that answers the question (columns name and founded)
  2. Go to Data->Pivot Table
  3. In the pivot table sheet, I select the field “Founded” as the column. The founding years are ordered and grouped, allowing us to count the number of magazines for each year starting from the earliest.
  4. I then select the field “Name” as the values. Because the pivot tables expects numbers by default (it tries to apply a SUM operation), nothing shows. To count the number of names associated with each year, the correct operation is COUNTA. I click on SUM and select COUNT A from the drop down menu.

This data can then be visualized with a bar graph.

Video game magazine creation every year since 1981

The trendline seems to show a decline in the dynamic of the market, but it’s not clear enough. Let’s group the years by half-decade and see what happens:

The resulting bar chart is much clearer:

The number of magazines created every half-decade decreases a lot in the lead up to the 2000s. The slump of the 1986-1990 years is perhaps due to a lagging effect of the North american video game crash of 1982-1984

Unlike what we could have assumed, the market is still dynamic, with one magazine founded every year for the last 5 years. That makes for an interesting, nuanced story.

VISUALISE

In this tutorial the initial graphs created during the analysis are enough to tell my story. But if the results of my investigations required a more complex, unusual or interactive visualisation to be clear for my public, or if I wanted to tell the whole story, context included, with one big infographic, it would fall into the “visualise” phase.

PUBLISH

Where to publish is an important question that you have to answer at least once. Maybe the question is already answered for you because you’re part of an organisation. But if you’re not, and you don’t already have a website, the answer can be more complex. Medium, a trendy publishing platform, only allows images at this point. WordPress might be too much for your need. It’s possible to customize the Javascript of tumblr posts, so it’s a solution. Using a combination of Github Pages and Jekyll, for the more technically inclined, is another. If a light database is needed, take a look at tabletop.js, which allows you to use a google spreadsheet as a quasi-database.


Any data expedition, of any size or complexity, can be approached with this process. Following it helps avoiding getting lost in the data. More often than not, there will be a need to get and analyze more data to make sense of the initial data, but it’s just a matter of looping the process.

[1] I formalized the “explore” part of my process after reading the excellent blog from MIT alumni Rahoul Bhargava http://datatherapy.wordpress.com

flattr this!

The Data Journalism Bootcamp at AUB Lebanon

Ali Rebaie - January 29, 2015 in Data Journalism, Events, Fellowship

Data love is spreading like never before. Unlike previous workshops we did in the MENA region, on the 18th of January 2015, we gave an intensive data journalism workshop at the American University of Beirut for four consecutive days in collaboration with Dr. Jad Melki, Director of media studiesilovedata program at AUB. The Data team at Data Aurora were really happy sharing this experience with students from different academic backgrounds, including media studies, engineering or business.

The workshop was mainly led by Ali Rebaie, a Senior School of Data fellow, and Bahia Halawi, a data scientist at Data Aurora, along with the data community team assistants; Zayna Ayyad, Noor Latif and Hsein Kassab. The aim of the workshop was to give the students an introduction to the world of open data and data journalism, in particular, through tutorials on open source tools and methods used in this field. Moreover, we wanted to put students on track regarding the use of data.AUBworkshop

On the first day, the students were introduced to data journalism, from a theoretical approach, in particular, the data pipeline which outlined the different phases in any data visualization project: find, get, verify, clean, analyze and present. After that, students were being technically involved in scraping and cleaning data using tools such as open refine and Tabula.

Day two was all about mapping, from mapping best practices to mapping formats and shapes. Students were first exposed to different types of maps and design styles that served the purpose of each map. Moreover, best mappings techniques and visualizations were emphasized to explain their relative serving purpose. Eventually, participants became able to differentiate between the dot maps and the choropleth maps as well as many others. Then they used twitter data that contained geolocations to contrast varying tweeting zones by placing these tweets at their origins on cartodb. Similarly, they created other maps using QGIS and Tilemill. The mapping exercises were really fun and students were very happy to create their own maps without a single line of code.

On the third day, Bahia gave a lecture on network analysis, some important mathematical notions needed for working with graphs as well as possible uses and case studies related to this field. Meanwhile, Ali was unveiling different open data portals to provide the students with more resources and data sets. After these topics were emphasized, a technical demonstration on the use of network analysis tool to analyze two topics wasworkshopaub performed. Students were analyzing climate change and later, the AUB media group on Facebook was also analyzed and we had its graph drawn. It was very cool to find out that one of the top influencers in that network was among the students taking the training. Students were also taught to do the same analysis for their own friends’ lists. Facebook data was being collected and the visualizations were being drawn in a network visualization tool.

After completing the interactive types of visualizations, the fourth day was about static ones, mainly, infographics. Each student had the chance to extract the information needed for an interesting topic to transform it into a visual piece.  Bahia was working around with students, teaching them how to refine the data so that it becomes simple and short, thus usable for building the infographic design. Later, Yousif, a senior creative designer at Data Aurora, trained the students on the use of Photoshop and illustrator, two of the tools commonly used by infographic designers. At the end of the session, each student submitted a well done infographic of which some are posted below.

After the workshop Zayna had small talks with the students to get their feedback and here she quoted some of their opinions:

“It should be a full course, the performance and content was good but at some point, some data journalism tools need to be more mature andStatic Infographics developed by the students at the workshop. user-friendly to reduce the time needed to create a story,” said Jad Melki, Director of media studies program at AUB, “it was great overall.”

“It’s really good but the technical parts need a lot of time. We learned about new apps. Mapping, definitely I will try to learn more about it,” said Carla Sertin, a media student.

“It was great we got introduced to new stuff. Mapping, I loved it and found it very useful for me,” said Ellen Francis, civil engineering student. “The workshop was a motivation for me to work more on this,” she added, “it would work as a one semester long course.”

Azza El Masri, a media student, is interested in doing MA in data journalism. “I like it I expected it to be a bit harder, I would prefer more advanced stuff in scraping,” she added.

 

flattr this!

Memories from San Jose

escueladedatos - January 29, 2015 in Data Expeditions

This article was originally posted in Spanish at Escuela de Datos by Phi Requiem, School of Data fellow in Mexico.


Last November, the Open Government Partnership (OGP) Summit took place in Latin America. CSO participants from 18 countries got together to share and exchange in an “unconference” where many topics were discussed. It was really interesting to learn about ways data things are handled in different countries, and to pinpoint the similarities and differences between our contexts.

Screen Shot 2015-01-13 at 16.48.14After a few words from the President of Costa Rica and other government representatives, a series of talks and roundtables began… And then, in parallel, Antonio (School of Data fellow in Peru) and I started a datathon.

In this datathon, our task was to give training and support to the five teams asking questions to the dataset on the commitments of the OGP countries, and which can be found here → Action Plan Commitments and IRM Data, http://goo.gl/yZmcKC, http://goo.gl/vLgYWj

The first step is to approach the data and structure it. After this, it was time to pose the questions we wanted to answer through the analysis of this data, and a lot of great questions (and interesting purposes) arose – many more than time allowed us to develop further. Teams picked the topics that seemed most relevant to them.

Screen Shot 2015-01-13 at 16.49.48Teams were already working on their analysis at 9 sharp the following morning, while OGP San Jose sessions were taking place. The datathon participants looked for more data, did cross-comparisons, scraping, etc. By noon, they had found results and answers – it was time to start working to present them in visualizations, infographics, maps, articles, etc. At 3PM, the teams impressed us with their presentations, and showed us the following outcomes: http://ogpcr.hackdash.org

  • Team Cero Riesgos: Generating information on risks by area. Data: OIJ, Poder Judicial.
  • Team Accesa: Comparing the perception of Latin American citizens on current topics in the LatinoBarometer with the commitments and achievements per country. The goal: to know if governments are responding to citizen concerns.
  • Team E’dawokka: Comparing the agendas and priorities of Central America with those in the rest of Latin America.
  • Team InfografiaFeliz: What countries look like in the Human Development Index in terms of their anti-corruption measures (and their success).
  • Team Bluffers: Measuring the percentage of delay and achievement of the commitments acquired by each country, and relating the design process for the commitments (measured by their relevance and potential impact) and their achievement.

At the end of the day, the jury chose teams InfografiaFeliz and Accesa as winners (which earned them a prize in cash).

Screen Shot 2015-01-13 at 16.51.43This was the first data expedition in Costa Rica, and you can find more in the following links: https://www.facebook.com/ogpsanjose, https://twitter.com/OGPSanJose, https://www.flickr.com/photos/ogpsanjose , http://grupoincocr.com/open-data/miembros-de-grupo-inco-ganan-la-primera-expedicion-de-datos-en-costa-rica

What I take away from my experience in this expedition is that people are always willing to learn and create, but not everyone is aware of what open data is, or how it can be useful for them. Initiatives of this sort are achieving their mission, but are insufficient – and that’s why we need to keep in touch with the participants and encourage them to share their experiences, and, why not: to replicate these initiatives.

Here are some tips for people with an interest in running data expeditions:

  • It’s difficult to explain the difference between a hackathon and a data expedition… But, the earlier this is out of the way, the better.
  • There most be a conceptual baseline. With such limited time it’s difficult to give introductions or previous workshops, but trying to do a bit of this can be really useful.
  • Teams always have good ideas to handle information and show conclusions, but many times impose limitations on themselves because they think the technical barriers are huge. Having a hackpad or Drive folder with examples and lists of tools can help people overcome that fear.

flattr this!

Digital Methods Initiative Winter School, University of Amsterdam

Sam Leon - January 27, 2015 in Events

Exploding book in the pulpit of the De Krijtberg Church in Spui, Amsterdam where some of the sprint took place

Exploding book in the pulpit of the Algemene Doopsgezinde Sociëteit in Spui, Amsterdam where some of the sprint took place

Last week I attended the 7th annual Winter School at Amsterdam University. Run by the Digital Methods Initiative, it took the form of a data sprint in which students joined professional developers and designers to answer research questions using social media data.

The DMI group at Amsterdam have developed and collated a suite of easy-to-use tools specifically for this kind of research. They are well worth checking out for anyone interested in this field and they cover a range of techniques from web scraping to list triangulation, and can be found online here.

I joined a group looking at bias across three APIs through which you can acquire Twitter data: the Search API, the Stream API and the proprietary Firehose endpoint – generally regarded as the most complete source of Twitter data. We had three sets captured from the three separate APIs for a critical period between 7th and 15th October 2014 when the Hong Kong protests were taking place.

Other groups took on a range of tasks from mapping the open data revolution to tracking the global climate change debate. All projects deployed a range of data wrangling techniques to answer these complex social, political and cultural phenomena.

A few things I learned:

  • Anyone wanting to use social media data to answer research questions about society and culture needs more than just spreadsheet skills. These datasets are generally larger than what Excel can comfortably handle, so basic database skills are a massive help.
  • Off-the-shelf tools for data analysis are brilliant, but often one needs to tweak lines of enquiry to your specific research question. Having some knowledge of programming means that you can take a much more flexible approach then when relying on the GUI tools.
  • Working in such a collaborative fast-paced environment meant that reproducibility (ie. where different parts of the team would re-use scripts and code developed by other parts of the team) was essential, alongside creating documentation on the fly. We found iPython notebooks especially useful for this, whereas analytical steps taken in Excel were harder to reproduce.
  • Free Twitter data – like that which can be acquired from the Search and Stream API – is still good, and sometimes better than that which you get through the proprietary APIs. When investigating online reactions to contentious and controversial events – such as the Hong Kong protests – tweets will inevitably be removed both by users and Twitter. If you want to get the full story, it’s far better to scape data as it comes in through the streaming API.
  • We’ve written about it before on this blog but the Pandas module for Python is brilliant for data wrangling and analysis and well worth getting to know if you plan on working with big datasets. It’s quick, flexible and powerful.
  • Nothing beats hands-on learning when it comes to technical skills. Having a motivating research question and some real life data is the best way to learn how to use the multitude of tools now at any budding data wranglers disposal. I learnt more in a week than I could have in months reading about tools and languages in the abstract!

For those interested in attending a DMI school in the future – take a look at the summer school coming later in 2015.

flattr this!

Data literacy needs within the Follow the Money network

Zara Rahman - January 26, 2015 in Events, Follow the Money

Last week, I joined a meeting hosted by the Transparency and Accountability Initiative around ‘Follow the Money’. It brought together people working on various aspects of the money trail, from access to information, to developers, investigative journalists, campaigners and activists, to think about how we can better collaborate in the future, and where the gaps are in the network.

Data Pipeline

I had the pleasure of running a couple of School of Data related sessions, too – one short skillshare running through the ‘data pipeline’, and a longer session building out a ‘follow the money’ focused data pipeline, focused mainly on gathering various data sources on topics in this field. The pipeline, in its rough format, is online here, and I’ll publish it in a more accessible format on the School of Data site soon too.

The value of asking questions

These sessions made me think about how data literacy skills could be developed within this community, and what is really needed to support and further the work of Follow the Money initiatives. Pragmatically speaking, for technology and data to be engaged and used successfully to further people’s work, not everyone in that room needs to be a superstar data wrangler or developer. What they do need, though, is to know where the people with technical expertise are, and to be able to ask them for assistance.

In the ‘thanks’ at the end of the workshop, lots of us mentioned that being in a space where, as our facilitator Allen Gunn said, ‘asking a question is considered to be a heroic act of leadership’ rather than a signal of a lack of knowledge. It was obvious that we valued most the patience and understanding of those around us who have higher levels of knowledge in a certain field, be that topical expertise, or technical; and that for many, the opportunity to ask these technical questions comes far too rarely.

This made me think about the value of the School of Data community – in my follow up emails from the workshop, I’ve been connecting people from various countries and contexts to former fellows who are based near them, or people running local groups in neighbouring countries, who can help them in person as well as online with their data-related queries. From past experience of seeing how well our data trainers and community members work with civil society groups with lower levels of data literacy, I’m optimistic that this will work out well – whether it be simply exchanging a few emails, or working with the community members or us at School of Data central to commission actual in person trainings.

Data wrangling + topical expertise = effective data-driven campaigning

As I mentioned, these connections provide a somewhat pragmatic solution to a need for better use of data among the community. Ideally, however, we would have people based within these organisations for long term support, who have both topical expertise and data wrangling skills.

And from what I heard, the need for this skillset will become extremely pronounced in the coming years; various directives and new laws regarding data availability and transparency sitting at different points of the money trail will be coming into force over the next couple of years, and they will bring with them a deluge of data. For example, data on extractives following Section 1504 of the Dodd Frank Reform, and company data following the EU Accounting and Transparency directives. What stories lie within that data, and how can we uncover them?

Many of the people and organisations represented at the Follow the Money workshop have been instrumental in campaigning for those transparency directives; but how many of those organisations possess in-house ability to actually process and use that data? Effectively, the next round of campaigning should be based on stories that come out of that hard-fought for data – but for that to happen, we need to start preparing now, by building data and technical skills among our communities.

Laying the groundwork for data storytelling

So, how can we start doing this? It could be through providing support for current employees of organisations to attend data expeditions or data skills courses on an ongoing basis; not just one off workshops, but people learning skills that are clearly relevant to their work, and having regular refresher courses to keep it relevant and in their minds. Or, (apologies for the blatant self-promotion here!) – it could be through supporting topical School of Data fellows to be based within the community and provide ongoing support, focusing on a specific topic – like extractives, or corporate money flows, for example.

Our experiences from the 2014 fellowships have led us to believe that the fellowship scheme is a sustainable and successful method of building up capacity both in terms of finding and supporting data storytellers and trainers (the Fellows), and equipping them with the skills they need to provide ongoing support to organisations based in their area, with whom they share their skills. Last year, the fellows carried out activities ranging from regular workshops with local organisations, to data clinics and expeditions for newcomers to get hands on with data, to simply being present within organisations as in-house support.

From what I saw last week, a lot of organisations within the Follow the Money network could do with this support. The earlier we start developing this capacity, the better equipped we will be as a community to start delving into the avalanche of data that is soon to come our way.

If you want to find out more about the Fellowship scheme, see the section ‘Fellowship Programme’ on our 2014 Annual Report, and if you’d like to talk about supporting a fellow through our upcoming 2015 scheme, get in touch with me on zara.rahman [at] okfn.org

flattr this!