You are browsing the archive for Research.

Data is a Team Sport: Government Priorities and Incentives

Dirk Slater - August 13, 2017 in Data Blog, Event report, Research

Data is a Team Sport is our open-research project exploring the data literacy eco-system and how it is evolving in the wake of post-fact, fake news and data-driven confusion.  We are producing a series of videos, blog posts and podcasts based on a series of online conversations we are having with data literacy practitioners.

To subscribe to the podcast series, cut and paste the following link into your podcast manager : or find us in the iTunes Store and Stitcher.

The conversation in this episode focuses on the challenges of getting governments to prioritise data literacy both externally and internally, and incentives to produce open-data and features:

  • Ania Calderon, Executive Director at the Open Data Charter, a collaboration between governments and organisations working to open up data based on a shared set of principles. For the past three years, she led the National Open Data Policy in Mexico, delivering a key presidential mandate. She established capacity building programs across more than 200 public institutions.
  • Tamara Puhovskia sociologist, innovator, public policy junky and an open government consultant. She describes herself as a time traveler journeying back to 19th and 20th century public policy centers and trying to bring them back to the future.

Notes from the conversation:

Access to government produced open-data is critical for healthy functioning democracies. It takes an eco-system that includes a critical thinking citizenry, knowledgeable civil servants, incentivised elected officials, and smart open-data advocates.  Everyone in the eco-system needs to be focused on long-term goals.

  • Elected officials needs incentivising beyond monetary arguments, as budgetary gains can take a long time to fruition.
  • Government’s capacities to produce open-data is an issue that needs greater attention.
  • We need to get past just making arguments for open-data, but be able to provide good solid stories and examples of its benefits.

Resources mentioned in the conversation:

Also, not mentioned, but be sure to check out Tamara’s work on Open Youth

View the full online conversation:

Flattr this!

Data is a Team Sport: One on One with Friedhelm Weinberg

Dirk Slater - July 29, 2017 in Data Blog, Event report, Research

Data is a Team Sport is our open-research project exploring the data literacy eco-system and how it is evolving in the wake of post-fact, fake news and data-driven confusion.  We are producing a series of videos, blog posts and podcasts based on a series of online conversations we are having with data literacy practitioners.

To subscribe to the podcast series, cut and paste the following link into your podcast manager : or find us in the iTunes Store and Stitcher.

Friedhelm Weinberg is the Executive Director of Human Rights Information and Documentation Systems (HURIDOCS), an NGO that supports organisations and individuals to gather, analyse and harness information to promote and protect human rights.  In this conversation we take a look at what it takes to be both a tool developer and a capacity builder, and how the two disciplines can inform and build upon each other.  Some of the main points:

  • The capacity building work needs to come first and inform the tool development.
  • It’s critical that human rights defenders have a clear understanding of what they want to do with the data before they start collecting it.
  • It’s critical for human rights defenders to have their facts straight as this counts the most in international courts of law, and cuts through ‘fake news.’
  • Machine learning has enormous potential in documenting human rights abuses in being able to process large amount of case work.
  • They have been successful in bringing developers in-house by making efforts to get them to better understand how the capacity builders work and also vice-versa.

Specific projects within Huridocs he talked about:

  • Uwazi is an open-source solution for building and sharing document collections
  • The Collaboratory is their knowledge sharing network for practitioners focusing on information management and human rights documentation.

Readings/Resources that are inspiring his work:

View the full online conversation:

Flattr this!

Data is a Team Sport: One on One with Heather Leson

Dirk Slater - July 19, 2017 in Community, Data Blog, Event report, Research

Data is a Team Sport is our open-research project exploring the data literacy eco-system and how it is evolving in the wake of post-fact, fake news and data-driven confusion.  We are producing a series of videos, blog posts and podcasts based on a series of online conversations we are having with data literacy practitioners.

To subscribe to the podcast series, cut and paste the following link into your podcast manager : or find us in the iTunes Store and Stitcher.

This episode features a one on one conversation with Heather Leson, the Data Literacy Lead at International Federation of Red Cross and Red Crescent Societies. As a technologist, she strengthens community collaboration via humanitarian technologies and social entrepreneurship. She builds partnerships, curates digital spaces, fosters volunteer engagement and delivers training while inspiring systems for co-creation with maps, code and data. At the International Federation of Red Cross Red Crescent, her mandate includes global data advocacy, data literacy and data training programs in partnership with the 190 national societies and the 13 million volunteers. She is a past Board Member at the Humanitarian OpenStreetMap Team (4 years), Peace Geeks (1 year), and an Advisor for MapSwipe – using gamification systems to crowdsource disaster-based satellite imagery. Previously, she worked as Social Innovation Program Manager, Qatar Computing Research Institute (Qatar Foundation) Director of Community Engagement, Ushahidi, and Community Director, Open Knowledge (School of Data).

Main Points from the Conversation:

  • Data protection is the default setting for humanitarian organisations collecting data.
  • She’s found its critical to focus on people and what they are trying to accomplish, as opposed to focusing on tools.
  • She’s added ‘socialisation’ as the beginning step to the data pipeline.

Heather’s Resources


Heather’s work

The full online conversation:

Flattr this!

Rethinking data literacy: how useful is your 2-day training?

Cedric Lombion - July 14, 2017 in Research

As of July 2017, School of Data’s network includes 14 organisations around the world which collectively participate to organise hundreds of data literacy events every year. The success of this network-based strategy did not come naturally: we had to rethink and move away from our MOOC-like strategy in 2013 in order to be more relevant to the journalists and civil society organisations we intend to reach.

In 2016 we did the same for our actual events.

The downside of short-term events

Prominent civic tech members have long complained about the ineffectiveness of hackathons to build long-lasting solutions for the problems they intended to tackle. Yet various reasons have kept the hackathon popular: it’s short-term, can produce decent-looking prototypes, and is well-known even beyond civic tech circles.

The above stays true for the data literacy movement and its most common short-term events: meetups, data and drinks, one-day trainings, two-day workshops… they’re easy to run, fund and promote: what’s not to love?

Well, we’ve never really been satisfied with the outcomes we saw of these events, especially for our flagship programme, the Fellowship, which we monitor very closely and aim to improve every year. Following several rounds of surveys and interviews with members of the School of Data network, we were able to pinpoint the issue: our expectations and the actual value of these events are mismatched, leading us not to take critical actions that would multiply the value of these events.

The Data Literacy Activity Matrix

To clarify our findings, we put the most common interventions (not all of them are events, strictly speaking) in a matrix, highlighting our key finding that duration is a crucial variable. And this makes sense for several reasons:

  • Fewer people can participate in a longer event, but those who can are generally more committed to the event’s goals

  • Longer events have much more time to develop their content and explore the nuances of it

  • Especially in the field of data literacy, which is focused on capacity building, time and repetition are key to positive outcomes

Data Literacy Activity Matrix

(the categories used to group event formats are based on our current thinking of what makes a data literacy leader: it underpins the design of our Fellowship programme.)

Useful for what?

The matrix allowed us to think critically about the added value of each subcategory of intervention. What is the effective impact of an organisation doing mostly short-term training events compared to another one focusing on long-term content creation? Drawing again from the interviews we’ve done and some analysis of the rare post-intervention surveys and reports we could access (another weakness of the field), we came to the following conclusions:

  • very short-term and short-term activities are mostly valuable for awareness-raising and community-building.

  • real skill-building happens through medium to long-term interventions

  • content creation is best focused on supporting skill-building interventions and data-driven projects (rather than hoping that people come to your content and learn by themselves)

  • data-driven projects (run in collaboration with your beneficiaries) are the ones creating the clearest impact (but not necessarily the longest lasting).

Data Literacy Matrix - Value Added

It is important, though, not to set short-term and long-term interventions in opposition. Not only can the difference be fuzzy (a long term intervention can be a series of regular, linked, short term events, for example) but both play roles of critical importance: who is going to apply to a data training if people are not aware of the importance of data? Conversely, recognising the specific added value of each intervention requires also to act in consequence: we advise against organising short-term events without establishing a community engagement strategy to sustain the event’s momentum.

In hindsight, all of the above may sound obvious. But it mostly is relevant from the perspective of the beneficiary. Coming from the point of the view of the organisation running a data literacy programme, the benefit/cost is defined differently.

For example, short-term interventions are a great way to find one’s audience, get new trainers to find their voice, and generate press cheaply. Meanwhile, long-term interventions are costly and their outcomes are harder to measure: is it really worth it to focus on training only 10 people for several months, when the same financial investment can bring hundreds of people to one-day workshops? Even when the organisation can see the benefits, their funders may not. In a field where sustainability is still a complicated issue many organisations face, long-term actions are not a priority.

Next steps

School of Data has taken steps to apply these learnings to its programmes.

  • The Curriculum programme, which initially focused on the production and maintenance of online content available on our website has been expanded to include offline trainings during our annual event, the Summer Camp, and online skillshares throughout the year;

  • Our recommendations to members regarding their interventions systematically refer to the data literacy matrix in order for them to understand the added value of their work;

  • Our Data Expert programme has been designed to include both data-driven project work and medium-term training of beneficiaries, differentiating it further from straightforward consultancy work.

We have also identified three directions in which we can research this topic further:

  • Mapping existing interventions: the number, variety and quality of data literacy interventions is increasing every year, but so far no effort has been made to map them, in order to identify the strengths and gaps of the field.

  • Investigating individual subgroups: the matrix is a good starting point for interrogating best practices and concrete outcomes in each of the subgroups, in order to provide more granular recommendations to the actors of the field and the designing of new intervention models.

  • Exploring thematic relevance: the audience, goals and constraints of, say, data journalism interventions, differ substantially from those of the interventions undertaken within the extractives data community. Further research would be useful to see how they differ to develop topic-relevant recommendations.

Flattr this!

Data is a Team Sport: Advocacy Organisations

Dirk Slater - July 12, 2017 in Data Blog, Event report, Research

Data is a Team Sport is our open-research project exploring the data literacy eco-system and how it is evolving in the wake of post-fact, fake news and data-driven confusion.  We are producing a series of videos, blog posts and podcasts based on a series of online conversations we are having with data literacy practitioners.

To subscribe to the podcast series, cut and paste the following link into your podcast manager : or find us in the iTunes Store and Stitcher.

In this episode we discussed data driven advocacy organisations with:

  • Milena Marin is Senior Innovation Campaigner at Amnesty International. She is currently leads Amnesty Decoders – an innovative project aiming to engage digital volunteers in documenting human right violations using new technologies. Previously she worked as programme manager of School of Data. She also worked for over 4 years with Transparency International where she supported TI’s global network to use technology in the fight against corruption.
  • Sam Leon, is Data Lead at Global Witness, focusing on the use of data to fight corruption and how to turn this information into change making stories. He is currently working with a coalition of data scientists, academics and investigative journalists to build analytical models and tools that enable anti-corruption campaigners to understand and identify corporate networks used for nefarious and corrupt practices.

Notes from the Conversation

In order to get their organisations to see the value and benefit of using data, they both have had to demonstrate results and have looked for opportunities where they could show effective impact. What data does for advocacy is to show the extent of the problem and it provides depths to qualitative and individual stories.  Milena credits the work of School of Data for the fact that journalists now expect their to be data accessible from Amnesty to back up their data.

  • They see gaps in the way that advocates can see data and new technologies as easy answers to their challenges, and the realities of implementing complex projects that utilise them.
  • In today’s post-fact world, they find that the term used as a tactic to  more quickly discredit their work and as a result they need to work harder at presenting verifiable data.
  • Amnesty’s decoder project has involved 45,000 volunteers and along with being able to review a huge amount of video, has had the side benefit of providing those volunteers with a deeper understanding of what Amnesty does.
  • Global Witness has had a limited amount of data-sets they have released to the public. But there needs to be a lot more learning about the implications of releasing open data-sets before that can be a default.
  • Intermediaries and externals are the only way for Advocacy organisations  to cover the gaps in their own expertise around data.

More about their work



Resources and Readings

From FabRiders

View the Full Conversation:


Flattr this!

Data is a Team Sport: One on One with Daniela Lepiz

Dirk Slater - July 3, 2017 in Community, Data Blog, Event report, Research

Data is a Team Sport is our open-research project exploring the data literacy eco-system and how it is evolving in the wake of post-fact, fake news and data-driven confusion.  We are producing a series of videos, blog posts and podcasts based on a series of online conversations we are having with data literacy practitioners.

To subscribe to the podcast series, cut and paste the following link into your podcast manager : or find us in the iTunes Store and Stitcher.

This episode features a one on one conversation with Daniela Lepiz, a Costa Rican data journalist and trainer, who is currently the Investigation Editor for CENOZO, a West African Investigative Journalism Project that aims to promote and support cross border data investigation and open data in the region. She has a masters degree in data journalism from the Rey Juan Carlos University in Madrid, Spain. Previously involved with OpenUP South Africa working with journalists to produce data driven stories.  Daniela is also a trainer for the Tanzania Media Foundation and has been involved in many other projects with South African Media, La Nacion in Costa Rica and other international organisations.

Notes from the conversation

Daniela spoke to us from Burkina Faso and reflected on the role of journalism and particularly data-driven journalism in functioning democracies.  The project she is working on empowering journalists working cross-border in western Africa to utilise data to expose corruption and violation of human rights.  To identify journalists to participate in the project, they have looked for individuals who are experienced, passionate and curious. The project engages existing media houses, such as Premium Times in Nigeria, to assure that there are places for their stories to appear.

Important points Daniela raises:

  • Media is continually evolving and learning to evolve and Daniela can see that data literacy will be a required proficiency in the next five years.
  • The biggest barrier to achieving open-data in government are government officials who resist transparency
  • There is a real fear from journalists of having to be proficient in maths when they are considering improve their skills to produce data-driven stories.  They often fail to realise that its about working with others that have skills on statistics and data analysis.
  • Trust in media has declined in such a big way and it means journalists have to work that much harder, particularly in labelling things as opinion or being biased.

Resources she finds inspiring

Her blogs posts

The full online conversation:

Daniela’s bookmarks!

These are the resources she uses the most often.

.Rddj – Resources for doing data journalism with RComparing Columns in Google Refine | OUseful.Info, the blog…Journalist datastores: where can you find them? A list. | Simon RogersAidInfoPlus – Mastering Aid Information for Change

Data skills

Mapping tip: how to convert and filter KML into a list with Open Refine | Online Journalism Blog
Mapbox + Weather Data
Encryption, Journalism and Free Expression | The Mozilla Blog
Data cleaning with Regular Expressions (NICAR) – Google Docs
NICAR 2016 Links and Tips – Google Docs
Teaching Data Journalism: A Survey & Model Curricula | Global Investigative Journalism Network
Data bulletproofing tips for NICAR 2016 – Google Docs
Using the command line tabula extractor tool · tabulapdf/tabula-extractor Wiki · GitHub
Talend Downloads


Git Concepts – SmartGit (Latest/Preview) – Confluence
GitHub For Beginners: Don’t Get Scared, Get Started – ReadWrite
LittleSis – Profiling the powers that be

Tableau customized polygons

How can I create a filled map with custom polygons in Tableau given point data? – Stack Overflow
Using Shape Files for Boundaries in Tableau | The Last Data Bender
How to make custom Tableau maps
How to map geographies in Tableau that are not built in to the product (e.g. UK postcodes, sales areas) – Dabbling with Data
Alteryx Analytics Gallery | Public Gallery
TableauShapeMaker – Adding custom shapes to Tableau maps | Vishful thinking…
Creating Tableau Polygons from ArcGIS Shapefiles | Tableau Software
Creating Polygon-Shaded Maps | Tableau Software
Tool to Convert ArcGIS Shapefiles into Tableau Polygons | Tableau and Behold!
Polygon Maps | Tableau Software
Modeling April 2016
5 Tips for Making Your Tableau Public Viz Go Viral | Tableau Public
Google News Lab
Open Semantic Search: Your own search engine for documents, images, tables, files, intranet & news
Spatial Data Download | DIVA-GIS
Linkurious – Linkurious – Understand the connections in your data
Apache Solr –
Apache Tika – Apache Tika
Neo4j Graph Database: Unlock the Value of Data Relationships
SQL: Table Transformation | Codecademy
dc.js – Dimensional Charting Javascript Library
The People and the Technology Behind the Panama Papers | Global Investigative Journalism Network
How to convert XLS file to CSV in Command Line [Linux]
Intro to SQL (IRE 2016) · GitHub
Malik Singleton – SELECT needle FROM haystack;
Investigative Reporters and Editors | Tipsheets and links
Investigative Reporters and Editors | Tipsheets and Links


More data

2016-NICAR-Adv-SQL/ at master · taggartk/2016-NICAR-Adv-SQL · GitHub
advanced-sql-nicar15/stats-functions.sql at master · anthonydb/advanced-sql-nicar15 · GitHub
2016-NICAR-Adv-SQL/ at master · taggartk/2016-NICAR-Adv-SQL · GitHub
Malik Singleton – SELECT needle FROM haystack;
Statistical functions in MySQL • Code is poetry
Data Analysis Using SQL and Excel – Gordon S. Linoff – Google Books
Using PROC SQL to Find Uncommon Observations Between 2 Data Sets in SAS | The Chemical Statistician
mysql – Query to compare two subsets of data from the same table? – Database Administrators Stack Exchange
sql – How to add “weights” to a MySQL table and select random values according to these? – Stack Overflow
sql – Fast mysql random weighted choice on big database – Stack Overflow
php – MySQL: Select Random Entry, but Weight Towards Certain Entries – Stack Overflow
MySQL Moving average
Calculating descriptive statistics in MySQL | codediesel
Problem-Solving using Graph Traversals: Searching, Scoring, Ranking, …
R, MySQL, LM and quantreg
ddi-documentation-english-572 (1).pdf
Categorical Data — pandas 0.18.1+143.g3b75e03.dirty documentation
python – Loading STATA file: Categorial values must be unique – Stack Overflow
Using the CSV module in Python
14.1. csv — CSV File Reading and Writing — Python 3.5.2rc1 documentation
csvsql — csvkit 0.9.1 documentation
weight samples with python – Google Search
python – Weighted choice short and simple – Stack Overflow
7.1. string — Common string operations — Python v2.6.9 documentation
Introduction to Data Analysis with Python |
A Complete Tutorial to Learn Data Science with Python from Scratch
GitHub – fonnesbeck/statistical-analysis-python-tutorial: Statistical Data Analysis in Python
Verifying the email – Email Checker
A little tour of aleph, a data search tool for reporters – (Friedrich Lindenberg)
Welcome – Investigative Dashboard Search
Investigative Dashboard
Working with CSVs on the Command Line
FiveThirtyEight’s data journalism workflow with R | useR! 2016 international R User conference | Channel 9
Six issue when installing package · Issue #3165 · pypa/pip · GitHub
python – Installing pip on Mac OS X – Stack Overflow
Source – Journalism Code, Context & Community – A project by Knight-Mozilla OpenNews
Introducing Kaggle’s Open Data Platform
NASA just made all the scientific research it funds available for free – ScienceAlert
District council code list | Statistics South Africa
How-to: Index Scanned PDFs at Scale Using Fewer Than 50 Lines of Code – Cloudera Engineering Blog
GitHub – gavinr/geojson-csv-join: A script to take a GeoJSON file, and JOIN data onto that file from a CSV file.
7 command-line tools for data science
Python Basics: Lists, Dictionaries, & Booleans
Jupyter Notebook Viewer


New folder

Reshaping and Pivot Tables — pandas 0.18.1 documentation
Reshaping in Pandas – Pivot, Pivot-Table, Stack and Unstack explained with Pictures – Nikolay Grozev
Pandas Pivot-Table Example – YouTube
pandas.pivot_table — pandas 0.18.1 documentation
Pandas Pivot Table Explained – Practical Business Python
Pivot Tables In Pandas – Python
Pandas .groupby(), Lambda Functions, & Pivot Tables
Counting Values & Basic Plotting in Python
Creating Pandas DataFrames & Selecting Data
Filtering Data in Python with Boolean Indexes
Deriving New Columns & Defining Python Functions
Python Histograms, Box Plots, & Distributions
Resources for Further Learning
Python Methods, Functions, & Libraries
Python Basics: Lists, Dictionaries, & Booleans
Real-world Python for data-crunching journalists | TrendCT
Cookbook — agate 1.4.0 documentation
3. Power tools — csvkit 0.9.1 documentation
Tutorial — csvkit 0.9.1 documentation
4. Going elsewhere with your data — csvkit 0.9.1 documentation
2. Examining the data — csvkit 0.9.1 documentation
A Complete Tutorial to Learn Data Science with Python from Scratch
For Journalism
ProPublica Summer Data Institute
Percentage of vote change | CARTO
Data Science | Coursera
Data journalism training materials
Pythex: a Python regular expression editor
A secure whistleblowing platform for African media | afriLEAKS
PDFUnlock! – Unlock secured PDF files online for free.
The digital journalist’s toolbox: mapping | IJNet
Bulletproof Data Journalism – Course – LEARNO
Transpose columns across rows (grefine 2.5) ~ RefinePro Knowledge Base for OpenRefine
Installing NLTK — NLTK 3.0 documentation
1. Language Processing and Python
Visualize any Text as a Network – Textexture
10 tools that can help data journalists do better work, be more efficient – Poynter
Workshop Attendance
Clustering In Depth · OpenRefine/OpenRefine Wiki · GitHub
Regression analysis using Python
R for Every Survey Analysis – YouTube
Git – Book
NICAR17 Slides, Links & Tutorials #NICAR17 // Ricochet by Chrys Wu
Register for Anonymous VPN Services | PIA Services
The Bureau of Investigative Journalism
dtSearch – Text Retrieval / Full Text Search Engine
Investigation, Cybersecurity, Information Governance and eDiscovery Software | Nuix
How we built the Offshore Leaks Database | International Consortium of Investigative Journalists
Liz Telecom/Azimmo – Google Search
First Python Notebook — First Python Notebook 1.0 documentation
GitHub – JasonKessler/scattertext: Beautiful visualizations of how language differs among document types


Flattr this!

Data is a Team Sport: Data-Driven Journalism

Dirk Slater - June 20, 2017 in Community, Data Blog, Event report, Research

Our podcast series that explores the ever evolving data literacy eco-system. Cut and paste this link into your podcast app to subscribe: or find us in the iTunes Store and Stitcher.

In this episode we speak with two veteran data literacy practitioners who have been involved with developing data-driven journalism teams.

Our guests:

  • Eva Constantaras is a data journalist specialized in building data journalism teams in developing countries. These teams that have reported from across Latin America, Asia and East Africa on topics ranging from displacement and kidnapping by organized crime networks to extractive industries and public health. As a Google Data Journalism Scholar and a Fulbright Fellow, she developed a course for investigative and data journalism in high-risk environments.
  • Natalia Mazotte is Program Manager of School of Data in Brazil and founder and co-director of the digital magazine Gender and Number. She has a Master Degree in Communications and Culture from the Federal University of Rio de Janeiro and a specialization in Digital Strategy from Pompeu Fabra University (Barcelona/Spain). Natalia has been teaching data skills in different universities and newsrooms around Brazil. She also works as instructor in online courses in the Knight Center for Journalism in the Americas, a project from Texas University, and writes for international publications such as SGI News, Bertelsmann-Stiftung, Euroactiv and Nieman Lab.

Notes from this episode

They both describe the lessons learned in getting journalists to use data that can drive social change. For Eva, getting journalists to work harder and just reporting that corruption exists is not enough, while Natalia, talks about how they use data on gender to drive debate and discussion around equality. What is critical for democracy is the existence of good journalism and this includes data-driven journalism that uncovers facts and gets at the root causes.

Gaps in the Data Literacy EcoSystem:

Natalia points out that corporations and government has the power because they are data-literate and can use it effectively, while people in low-income communities, such as favela’s really suffer because they are at the mercy of what story gets told by looking at the ‘official’ data.

Eva feels that there has been too much emphasis on short-term and quick solutions from individuals who have put a lot of money in making sure that data is ready and accessible.  Donors need to support more long-term efforts and engagement around data-literacy.

Adjusting to a ‘post-fact’ world means:

Western journalists have spent too much time focusing on reporting on polling data rather than reporting on policies and it’s important for newer journalists to understand why that was problematic.

In Brazil, the main stream media is focusing on ‘what’s happened’ while independent media is focusing on ‘why it’s happened’ and this means the media landscape is changing.

They also talked about:

  • Ethics and the responsibility inherent in gathering and storing data, along with the grey areas around privacy.
  • How to get media outlets to value data-driven journalism by getting them to understand that people are increasingly getting their ‘breaking news’ from social media, so they need to look at providing more in-depth stories.

They wanted to plug:

Readings/Resources they find inspiring for their work.

Resources contributed from the participants:

View the online conversation in full:

Flattr this!

Data is a Team Sport: Enabling Learning

Dirk Slater - June 6, 2017 in Community, Event report, Research

Our podcast series that explores the ever evolving data literacy eco-system. Cut and paste this link into your podcast app to subscribe: or find us in the iTunes Store.

In this episode we speak with two veteran data literacy practitioners who have been involved with directly engaging learners to get beyond spreadsheets to build confidence and take agency in their own learning.

Our guests:

  • Rahul Bhargava is a researcher and technologist specializing in civic technology and data literacy. He creates interactive websites used by hundreds of thousands, playful educational experiences across the globe, and award-winning visualizations for museum settings. As a research scientist at the MIT Center for Civic Media, Rahul leads technical development on projects ranging from interfaces for quantitative news analysis to platforms for crowd-sourced sensing.
  • Lucy Chambers initially embarked on a career as a journalist, she took a few turns which lead to a career at Open Knowledge teaching journalists how and why to work with data. She was one of the editors of the Data Journalism Handbook. She later lead the highly successful School of Data programme which extended technical training to non-profit organisations. Lately, she has focussed on delivery of software projects as a product manager. Most recently, she has been working in West Africa on health related software.

Notes from this episode

Rahul described methods to data novices to think more creatively by drawing and using a gallery of their artwork to build confidence to think more critically. He says that this experience is what led to the creation of, a website designed specifically to engage learners.

Lucy tells of School of Data’s initial struggles with setting up a one-size fits all online curriculum. They learned through focus groups and testing that a tool-based approach was not helpful or achievable. Instead they needed a people based approach. They then turned to developing a fellowship programme which is very much at the core of the School of Data network.

Both of our guests had strong opinions about building data literacy culture in organisations. A common mistake is made by letting the IT Department provide data training.  Organisations often produce unhelpful data metrics and dashboards that don’t actually help staff get a full picture of progress.

Gaps in the Data Literacy EcoSystem:

  • Toolbuilders not understanding and not building for learners.
  • NGO’s not testing out data driven messages with their audiences before they release them.

Adjusting to a ‘post-fact’ world means:

  • We need to make sure that people understand that data is not necessarily truth, that it is often used as rhetoric and that it carries bias. Data sets should have a biography attached.
  • Narrative wins, so the data presentation methods where the audience is bombarded with facts and figures just doesn’t work. We have to spend more time pulling out the compelling narrative from the data.

They wanted to plug:

  • Rahul is building a co-hort around further development of Ping him via twitter to get more information on that.
  • Lucy’s blog is Tech to Human and she writes about her work and what she’s learning. She is working on a project for MySociety called EveryPolitician and writing about it on Medium.

Readings/Resources they find inspiring for data literacy work.

View the full online conversations:

Flattr this!

Data Journalism in Turkey: still a new topic

Pınar Dag - July 27, 2016 in Event report, Research

School of Data now counts among its ranks a local group in Turkey led by Pınar Dağ, an experience Data Journalist and Journalism professor based in Istanbul. As part of their activities, they have been running numerous datajournalism trainings, attracting an important proportion of non-journalists, eager to learn about data. The article below presents a data-driven overview of these workshops.

According to the participation research of data journalism workshops carried out with all 110 participants hosted by Pınar Dağ and Sadettin Demirel, 36.4% of all participants stated that they have studied data literacy courses before the workshops, while the remaining 70 people make up the 63% that have never before studied data literacy or data analysis.


If we analyse the data obtained in the context of gender and age, there are 21 men and 17 women that expressed they have training experience regarding data analysis. However, the interesting figure comes from the younger generation. 65% of participants are between 18 and 25 ages, and they have no experience or previous training in data literacy.


These numbers indicated that even though the participants met with data analysis and data literacy by way of data journalism workshops, the participants from the 18-25 age range have a serious lack of data literacy.



The workshops contribute to spreading data journalism terminologies



Data journalism has its own terminology and vocabulary, so in order to evaluate how participants learn the main names such as data journalism, open data, open government, data portal, data visualization, we asked them whether they ascertained these terminologies thought the workshops, or from elsewhere.



More than half of the participants pointed out that they had known previously the data journalism and open data terms. On the other hand, 36 people expressed that they understand data journalism thanks to the workshops, where as 41 participants said the same for open data.


Also, a number of participants stated they know open government, data portal and data visualization terminologies by way of data journalism training. This was more than other participants that indicated they already knew. If we describe the issue with numbers, 70 of 110 participants learned about data portals with the workshops, while 53 of them for open government, and 51 of them for data visualisation stated that they have known these terms through the help of workshops.


The good news is that the number of participants in the 18 – 25 age range that learn data journalism terms thanks to the workshops, is more than twice of those who knew the terminologies previously. As a result, these statistics underline that workshops facilitate the understanding of terminology.


90 percent of participants like the data journalism workshops


99 people (90 %) of participants expressed that they liked the data journalism training. Seven people stayed indecisive and four of them said they didn’t like it. The participants appreciated not only workshops but also the instructors and contents of training and the other guests.


As the data indicated, 87.1 % of people were very pleased by the content of the workshop while 102 participants said they liked the guests. 94 of them said they appreciated the data journalism instructors. Most of participants had a positive attitude on content, instructors, and guests of workshops. There were also participants that remained indecisive, or did not appreciate some features of the data journalism training.




52.8 % of participants did not like the duration of the workshops



While 27.2% of participants, a total of 30 participants, expressed that they liked the duration and time of the workshops, 28 people, 25.4%, stayed indecisive. So 52.8% of participants, the remaining 58 people, were not pleased with the length of the workshops.


On the other hand, there was a negative perception about the infrastructure and internet network among the participants. 54.1 percentage of participants pointed out that they didn’t appreciate the internet network that was provided for data journalism activities.


Moreover, most of the participants were fine with accommodation (65.4%), transportation (74.1%) and catering services (66.3%) that were supplied during the workshops.


More than half of the participants heard about workshops via social media


The question was ‘how did you find out about workshops and where do you get workshop news and announcements?’ The participants stated that they find out and get in touch with workshops predominantly via social media (54%), instructors at universities (26.3%), e-mail (30%), website (35.4%), and friends (17.7%)


    It seems especially digital communication channels which are social media and e- mail played an important role to get in touch with participants and get them informed.  

53 percent of Participants stated: I can make data visualizations



More than half of the participants said that they can create data visualizations thanks to the workshops while 51% of them, a total of 57 participants, expressed that workshops informed and facilitated them to get involved with various kinds of data sources. Also, 41% of participants stated they have developed their data analysis skills and 40% of them underlined they can work with data thanks to the help of the workshops.




All we need is longer workshops


The last two question of the research are about participants suggest what to improve data journalism workshops and increase and spread data literacy.


87% of the participants, a total of 86 people, suggested that they need long-dated workshops that are based on generation of data journalism projects. This is the most supported advice among the other options. Other options were cooperation with journalism association, inviting international data journalists for workshops and arranging MOOC programs to increase efficiency and get in touch with more people.



    The university curriculums need data literacy courses  

In order to increase data literacy, 74.5% of participants indicated that the university curriculum needs more data literacy and data journalism courses. The participants suggest that these courses could add to the current education plans. Also 60.9% of them think the key is open data. If government increases sharing more sources of open data, that could improve data literacy in Turkey.



There are other suggestions too. For example, cooperation with journalists, NGOs and developers, to fund support of the government to create a data savvy generation.



Last but not least, we asked all the participants, ‘If you had the chance, would you want to attend more workshops?’ 103 of 110 participants said yes, they would.



Quantitative method is used along with survey data gathering techniques for this research. Participants are reached via e-mail and Google form is used as a tool of the questionnaire. The population of this research is the participants of the last 10 data journalism workshops. Because of the fact that a number of participants have changed between 10 and 20 for per workshop, the exact number of participants has taken 15 for per workshops. So the research universe is 150 people. The sample size of research is calculated with 95% confidence level and a 5 % margin of error. The sample size is accepted as 109 participants.


Tableau, and Google Charts used for data visualisations

Research datasets:

Research questionnaire :


Flattr this!

Research Results Part 5: Improving Data Literacy Efforts

Dirk Slater - February 5, 2016 in Research

As technologies advance and the accessibility of data becomes ubiquitous, data literacy skills will likely gain increasing importance. The School of Data training resources have already laid an important foundation for social change efforts to harness data and improve their impact. Going forward, School of Data local communities will have to take into account their role as stewards of the curriculum, and continue to develop and incorporate new learnings as access to data continues to increase.

From what we (Mariel Garcia, and myself) have learned by conducting this research, we make the following recommendations:

  • Training the trainers: The School of Data curriculum is the foundation for much of the Data Literacy training that is happening both inside and outside the School of Data network, as reported by interviewees; it would make sense to focus efforts on preparing materials not just for learner consumption, but also in a curriculum format for trainers.
  • More research on pedagogical methods: Additional research and establishment of effective pedagogical methods of data literacy training would be beneficial – many interviewees mentioned the importance of this topic, and yet had no resources to share about it. In this regard, Peer to Peer University is the one participant that has invested most resources into this understanding, and is a great ally going forward in this area.
  • More knowledge-sharing within the network: In this regard, the School of Data network also functions as a ‘community of practice’ for trainers who are sharing advice and tips on providing data literacy training, but this could be strengthened by actively promoting conversations around the topics covered in this research.
  • Measuring the impact: As with different initiatives, impact evaluation is an area in which data literacy work can still grow. Both the School of Data local communities and data literacy related organisations need much stronger articulations of their long-term goals and intended impact in the short term.  School of Data events might be a good space to have the necessary conversations to find frameworks of evaluation that work for different work formats and budgets. Some organizations outside of the School of Data network (IREX and Internews) have worked extensively on this, and could be good references going forward.
  • Promoting long term engagements: It appeared during the research that only older and established organisations had started long term projects and engagements related to data literacy. Consequently, it might make sense for School of Data to help smaller and newer organisations within its community to start and sustain long term engagements, by helping them find the necessary resources. This could provide an important focal point for collaborations within the network as it will likely yield important learnings.
  • Data literacy at the organisation level: Articulate how individual data literacy training can complement and support long term engagements that will lead to organisational data literacy. Building local fellowship programs that can engage social change organisations over the long-term and build their capacity to utilise data in their campaigns will likely lead to deeper alliances and joint funding opportunities.
  • Better collaboration with outside partners: The project would stand to benefit from more linkages and collaborations with academia, open data-related civil society efforts. Additionally, more efforts can be made to improve the accessibility of the School of Data curriculum, methodologies and trainings. This will likely lead to more diverse and sustainable funding.

The goal of this research was to empower the School of Data Steering Committee to take strategic decisions about the programme going forward along with helping the School of Data network members build on the successes to date. We hope that in providing this research and recommendations in an accessible format, both School of data and the wider network of data literacy practicioners will benefit from it. Hopefully, these research results will complement and contribute to the School of Data’s goal of improving the impact of social change efforts through data literacy.

In our next and final blog post, we will present a list of resources and references we used during our research.

Flattr this!