Ten Cool Things I Learned at DataJConf

- August 18, 2017 in Events, Fellowship

This article was cross-posted from its original location at the Open and Shut blog

I had a fantastic time at the European Computational and Data Journalism Conference in Dublin on 6-7 July in the company of many like-minded data journalists, academics, and open data practitioners. There were a lot of stimulating ideas shared during the presentations on the first day, the unconference on the second day, and the many casual conversations in between!

In this post I’d like to share the ten ideas that stuck with me the most (it was tough to whittle it down to just ten!). Hopefully you’ll find these thoughts interesting, and hopefully they’ll spark some worthwhile discussions about data journalism and storytelling.

I’d really love to hear what you have to say about all of this, so please do share any thoughts or observations that you might have below the line!

image alt text

The European Data and Computational Journalism Conference, Dublin, 6-7 July 2017

  1. ‘Deeper’ data journalism is making a real impact

Marianne Bouchart – manager of the Data Journalism Awards – gave a presentation introducing some of the most exciting award winners of 2017, and talked about some of the most important new trends in data journalism today. Perhaps unsurprisingly, given the electoral rollercoasters of the past year, a lot of great data journalism has been centred around electioneering and other political dramas.

Marianne said that “impact” was the theme that ran through the best pieces produced last year, and she really stressed the central role that investigative journalism needs to play in producing strong data-driven stories. She said that impactful investigative journalism is increasingly merging with data journalism, as we saw in projects shedding light on shady anti-transparency moves by Brazilian politicians, investigating the asset-hoarding of Serbian politicians, and exposing irresponsible police handling of sexual assault cases in Canada.

  1. Machine learning could bring a revolution in data journalism

Two academics presented on the latest approaches to computational journalism – journalism that applies machine learning techniques to dig into a story.

Marcel Broersma from the University of Groningen presented on an automated analysis of politicians’ use of social media. The algorithm analysed 80,000 tweets from Dutch, British and Belgian politicians to identify patterns of what he called the ‘triangle of political communication’ between politicians, journalists, and citizens.

The project wasn’t without its difficulties, though – algorithmically detecting sarcasm still remained a challenge, and the limited demographics of Twitter users meant that this kind of research could only look at how narrow certain segments of society communicated.

Jennifer Stark from the University of Maryland looked at the possibilities for algorithms to be biased – specifically looking at Google Image Search’s representations of presidential candidates Hillary Clinton and Donald Trump’s photos during their campaigns. Through the use of an image recognition API that detects emotions, she found that Clinton’s pictures were biased towards showing her appear happier whereas for Trump, both happiness and anger were overrepresented.

Although it’s still early days for computational journalism, talks like these hinted at exciting new data journalism methods to come!

  1. There are loads of ways to learn new skills!

The conference was held at the beautiful University College Dublin, where a brand new master’s program in data journalism is being launched this year. We also heard from one of the conference organisers, Martin Chorley, about Cardiff University’s Master’s in Computational and Data Journalism, which has been going strong for three years, and has had a great track record of placing students into employment.

But formal education isn’t the only way to get those cutting edge data journo skills! One of the conference organisers also presented the results of a worldwide survey of data journalists, taking in responses from 180 data journalists across 44 countries. One of the study’ most notable findings was that only half of respondents had formal training in data journalism – the rest picked up the necessary skills all by themselves. Also, when asked how they wanted to further their skills, more respondents said they wanted to brush up on their skills in short courses rather than going back to school full-time.

  1. Want good government data? Be smart (and be charming)!

One of the most fascinating parts of the conference for me was learning about the different ways data journalists obtained data for their projects.

Kathryn Tourney from The Detail in Northern Ireland found Freedom of Information requests useful, but with the caveat that you really needed to know the precise structure of the data you are requesting in order to get the best data. Kathryn would conduct prior research on the exact schemas of government databases and work to get hold of the forms that the government used to collect the data she wanted before making the actual FOI requests. This ensured that there was no ambiguity about what she’d receive on the other side!

Conor Ryan from Ireland’s RTÉ found that he didn’t need to make FOI requests to do deep investigative work, because there was already a lot of government data “available” to the public. The catch was that this data was often buried behind paywalls and multiple layers of bureaucracy.

Conor stressed the importance of ensuring that any data sources RTÉ managed to wrangle were also made available in a more accessible way for future users. One example related to accessing building registry data in Ireland, where originally a €5 fee existed for every request made. Conor and his team pointed out this obstacle to the authorities and persuaded them to change the rules so that the data would be available in bulk in the future.

Lastly, during the unconference one story from Bulgaria really resonated with my own experiences trying to get a hold of data from governments in closed societies. A group of techies offered the Bulgarian government help with an array of technical issues, and by building relationships with staff on the ground – as well as getting the buy-in of political decision makers – they were able to get their hands on a great deal of data that would have forever remained inaccessible if they’d gone through the ‘standard’ channels for accessing public information.

  1. The ethics of data sharing are tricky

The best moments at these conferences are the ones that make you go: “Hmm… I never thought about it that way before!”. During Conor Ryan’s presentation, he really emphasized the need for data journalists to consider the ethics of sharing the data that they have gathered or analysed.

He pointed out that there’s a big difference between analysing data internally and reporting on a selected set of verifiable results, and publishing the entire dataset from your analysis publicly. In the latter case, every single row of data becomes a potential defamation suit waiting to happen. This is especially true when the dataset involved is disaggregated down the level of individuals!

  1. Collaboration is everything

Being a open data practitioner means that my dream scenarios are collaborations on data-driven projects between techies, journalists and civil society groups. So it was really inspiring to hear Megan Lucero talk about how The Bureau Local (at the Bureau of Investigative Journalism) has built up a community of civic techies, local journalists, and civil society groups across the UK.

Even though The Bureau Local was only set up a few months ago, they quickly galvanized this community around the 2017 UK general elections, and launched four different collaborative investigative data journalism projects. One example is their piece on targeted ads during the election campaign, where they collaborated with the civic tech group Who Targets Me to collect and analyse data about the kinds of political ads targeting social media users.

I’d love to see more experiments like The Bureau Local emerging in other countries as well! In fact, one of the main purposes of Open and Shut is precisely to build this kind of community for folks in closed societies who want to collaborate on data-driven investigations. So please get involved!

image alt text

Who Targets Me? Is an initiative working to collect and analyse data about the kinds of political ads targeting social media users.

  1. Data journalism needs cash – so where can we find it?

It goes without saying these days that journalism is having a bad time of it at the moment. Advertising and subscription revenues don’t pull in nearly as much cash as the used to. Given that pioneering data-driven investigative journalism takes a lot of time and effort, the question that naturally arises is: “where do we get the money for all this?”. Perhaps unsurprisingly, no-one at DataJConf had any straightforward answers to this question.

A lot of casual conversations in between sessions drifted onto the topic of funding for data journalism, and lots of people seemed worried that innovative work in the field is currently too dependent on funding from foundations. That being said, attendees also shared stories about interesting funding experiments being undertaken around the world, with the Korean Center for Investigative Journalism’s crowdfunding approach gaining some interest.

  1. Has data journalism been failing us?

In the era of “fake news” and “alternative facts”, a recurring topic in many conversations was about whether data journalism actually had any serious positive impacts. During the unconference discussions, some of us ended up being sucked into the black hole question of “What constitutes proper journalism anyway?”. It wasn’t all despair and navel-gazing, however, and we definitely identified a few concrete things that could be improved.

One related to the need to better represent uncertainty in data journalism. This ties into questions of improving the public’s data literacy, but also of traditional journalism’s tendency to present attention-grabbing leads and conclusions without doing enough to convey complexity and nuance. People kept referencing FiveThirtyEight’s election prediction page, which contained a sophisticated representation of the uncertainty in their modelling, but hid it all below the fold – an editorial decision, it was argued, that lulled readers into thinking that the big number that they saw at the top of the page was the only thing that mattered.

image alt text

FiveThirtyEight’s forecast of the 2016 US elections showed a lot of details below the fold about their forecasting model’s uncertainty, but most readers just looked at the big percentages at the top.

Another challenge identified by attendees was that an enormous amount of resources were being deployed to preach to the choir instead of reaching out to a broader base of readers. The unconference participants pointed out that a lot of the sophisticated data journalism stories written in the run-up to the 2016 US elections were geared towards partisan audiences. We agreed that we needed to see more accessible, impactful data stories that were not so mired in party politics, such as ProPublica’s insightful piece on rising US maternal mortality rates.

  1. Data journalism can be incredibly powerful in the Global South

Many of the talks were about data journalism as it was practised in Western countries – with one notable exception. Eva Constantaras, who trains investigative data journalism teams in the Global South, held a wonderful presentation about the impactfulness of data journalism in the developing world. She gave the examples of IndiaSpend in India and The Nation in Kenya, and spoke about how their data-driven stories worked to identify problems that resonated with the public, and explain them in an accessible and impactful way.

Election coverage in these two examples shared by Eva focused on investigating the consequences of the policy proposals of politicians, engaging in fact-checking, and identifying the kinds of problems that were faced by voters in reality.

Without the burden of partisan echo-chambers, and because data journalism is still very new and novel in many parts of the world, data journalism could end up having a huge impact on public debate and storytelling in the Global South. Watch this space!

image alt text

Kenya’s The Nation has been producing data-driven stories more and more frequently, such as this piece on Kenya’s Eleventh Elections in August 2017*

  1. Storytelling has to connect on a human level

If there was one recurring theme that I heard throughout the conference about what makes data journalism impactful, it was that the data-driven story has to connect on a human level. Eva had a slide in her talk with a quote from John Steinbeck about what makes a good story:

“If a story is not about the hearer he [or she] will not listen… A great lasting story is about everyone, or it will not last. The strange and foreign is not interesting – only the deeply personal and familiar.”

“I want loads of money” — Councillor Hugh McElvaney caught on hidden camera video from RTÉ

Conor from RTÉ also drove the same point home. After his team’s extensive data-driven investigative work revealed corruption in Irish politics, the actual story that they broke involved a hidden-camera video of an undercover interview with one of these politicians. This video highlighted just one datapoint in a very visceral way, which ultimately resonated more with the audience than any kind of data visualisation could.


I could go on for longer, but that’s probably quite enough for one blog post! Thanks for reading this far, and I hope you managed to gain some nice insights from my experiences at DataJConf. It was a fascinating couple of days, and I’m looking forward to building upon all of these exciting new ideas in the months ahead! If any of these thoughts have got you excited, curious (or maybe even furious) we’d love to hear from you below the line.

Open & Shut is a project from the Small Media team. Small Media are an organisation working to support freedom of information in closed societies, and are behind the portal Iran Open Dat*a.

Flattr this!

Data is a Team Sport: Government Priorities and Incentives

- August 13, 2017 in Data Blog, Event report

Data is a Team Sport is a series of online conversations held with data literacy practitioners in mid-2017 that explores the ever evolving data literacy eco-system.

To subscribe to the podcast series, cut and paste the following link into your podcast manager : http://feeds.soundcloud.com/users/soundcloud:users:311573348/sounds.rss or find us in the iTunes Store and Stitcher.

[soundcloud url=”https://api.soundcloud.com/tracks/337462183″ params=”color=ff5500&auto_play=false&hide_related=false&show_comments=true&show_user=true&show_reposts=false” width=”100%” height=”166″ iframe=”true” /]

The conversation in this episode focuses on the challenges of getting governments to prioritise data literacy both externally and internally, and incentives to produce open-data and features:

  • Ania Calderon, Executive Director at the Open Data Charter, a collaboration between governments and organisations working to open up data based on a shared set of principles. For the past three years, she led the National Open Data Policy in Mexico, delivering a key presidential mandate. She established capacity building programs across more than 200 public institutions.
  • Tamara Puhovskia sociologist, innovator, public policy junky and an open government consultant. She describes herself as a time traveler journeying back to 19th and 20th century public policy centers and trying to bring them back to the future.

Notes from the conversation:

The conversation focused on challenges they face pressuring governments to open up their data and make it available for public use.  In order for governments to develop and maintain open data programmes, there needs to be an ecosystem of data literate actors that includes knowledgeable civil servants and incentivised elected officials being held to account by a critical thinking citizenry supported by smart open data advocates. Governments incentivisation for open data can’t be solely based on budgetary or monetary savings, they need to be motivated to want to use data to improve the effectiveness of their programs.

Once elected officials are elected, it’s too late to educate and motivate them enough to push for open data programmes as they have too many other priorities and pressures. They have to have a level of data literacy that will provide enough knowledge, motivation and commitment to open data before they are  elected.

Making arguments for open data is not having adequate impact, but we need to be able to provide good solid stories and examples of its benefits. Those advocating for open data have perhaps been too optimistic that citizens would find the data useful once it has been released. A ‘supply and demand’ frame is still an important way to look at open data projects and assess their ability to have impact.

Access to government produced open data is critical for healthy functioning democracies, but government’s ability to release open data is heavily dependent on their own capacities to produce and work with data.  There is currently not enough technical support for public officials tasked with implementing open data projects.

Resources mentioned in the conversation:

Also, not mentioned, but be sure to check out Tamara’s work on Open Youth

View the full online conversation:

[youtube https://www.youtube.com/watch?v=kK8Pz4DZ06s]

Flattr this!

Data is a Team Sport: Mentors Mediators and Mad Skills

- August 7, 2017 in Community, Data Blog, Event report

Data is a Team Sport is a series of online conversations held with data literacy practitioners in mid-2017 that explores the ever evolving data literacy eco-system.

To subscribe to the podcast series, cut and paste the following link into your podcast manager : http://feeds.soundcloud.com/users/soundcloud:users:311573348/sounds.rss or find us in the iTunes Store and Stitcher.

[soundcloud url=”https://api.soundcloud.com/tracks/336474972″ params=”color=ff5500&auto_play=false&hide_related=false&show_comments=true&show_user=true&show_reposts=false” width=”100%” height=”166″ iframe=”true” /]

This episode features:

  • Emma Prest oversees the running of DataKind UK, leading the community of volunteers and building understanding about what data science can do in the charitable sector. Emma sits on the Editorial Advisory Committee at the Bureau of Investigative Journalism. She was previously a programme coordinator at Tactical Tech, providing hands-on help for activists using data in campaigns. 
  • Tin Geber has been working on the intersection of technology, art and activism for most of the last decade. In his previous role as Design and Tech Lead for The Engine Room, he developed role-playing games for human rights activists; collaborated on augmented reality transmedia projects; and helped NGOs around the world to develop creative ways to combine technology and human rights.

Notes from the conversation

In this episode we discussed ways to move organisations beyond data literacy and to the point of data maturity, where organisations are able to manage data-driven projects on their own. Training in itself can be helpful with hard skills, such as how to do analysis, but in terms of learning how to run a data projects, Emma asserts that you have to run a project with them as it takes a lot of hand-holding. There needs to be commitment within the entire organisation to implement a data project, as it will take support and inputs from all parts.  The goal of DataKind UK’s long-term engagements is to help an organisation to build an understanding of what is good data practice.  

Tin points out how critical it is for organisations to be able to learn from others that are working in similar contexts and environments. While there are international networks and resources that are accessible, his biggest challenge is identifying local networks that his clients can connect with and receive peer support.

Another critical element for reaching data maturity, is the existence of champions striving to develop good data practice within an organisation. Tin and Emma both acknowledge that these types of individuals are rare, have a unique skill set, and are often not in senior management positions. There’s a need for greater support for these individuals in the form of: mentoring, networks of practice and training courses that focus on how other organisations have successfully run data projects.

Intermediaries are often focused on demystifying new technologies for civil society organisations. Currently a lot of emphasis on grappling with the implications of machine learning, but it tends to point out the negative impacts (i.e. Cathy O’Neil’s book on ‘Weapons of Math Destruction’), and there needs to be greater examination of positive impacts and stories of CSO’s using it well and contributing to social good.

DataKind UK’s resources:

Tin’s resources:

Resources that are inspiring Emma’s Work:

Resources that are inspiring Tin’s work:

  • DataBasic.io – A a suite of easy-to-use web tools for beginners that introduce concepts of working with data
  • Media Manipulation and Disinformation Online – Report from Data and Society on how false or misleading information is having real and negative effects on the public consumption of news.
  • Raw Graphs – The missing link between spreadsheets and data visualization

View the full online conversation:

[youtube https://www.youtube.com/watch?v=GmsPcLw4Mec&w=560&h=315]

Flattr this!

Data is a Team Sport: One on One with Friedhelm Weinberg

- July 29, 2017 in Data Blog, Event report

Data is a Team Sport is a series of online conversations held with data literacy practitioners in mid-2017 that explores the ever evolving data literacy eco-system.

To subscribe to the podcast series, cut and paste the following link into your podcast manager : http://feeds.soundcloud.com/users/soundcloud:users:311573348/sounds.rss or find us in the iTunes Store and Stitcher.

[soundcloud url=”https://api.soundcloud.com/tracks/335294348″ params=”color=ff5500&auto_play=false&hide_related=false&show_comments=true&show_user=true&show_reposts=false” width=”100%” height=”166″ iframe=”true” /]

Friedhelm Weinberg is the Executive Director of Human Rights Information and Documentation Systems (HURIDOCS), an NGO that supports organisations and individuals to gather, analyse and harness information to promote and protect human rights. 

Notes from the Conversation

We discussed at what it takes to be both a tool developer and a capacity builder. While the two disciplines inform and build upon each other, Friedhelm strongly feels that the capacity building work needs to come first and be a foundation for tool development. The starting point for human rights defenders is to have a clear understanding of what they want to do with data before they start collecting it.

Where in the past they used external developers to create tools, they have recently hired developers to be on staff and work side by side with their capacity builders. They have also recently been building their own capacity to help human right defenders utilise machine learning for processing large amounts of documents and extracting information about human rights abuses.

Specific projects within Huridocs he talked about:

  • Uwazi is an open-source solution for building and sharing document collections
  • The Collaboratory is their knowledge sharing network for practitioners focusing on information management and human rights documentation.

Readings/Resources that are inspiring his work:

View the full online conversation:

[youtube https://www.youtube.com/watch?v=00WbqeDojP0&w=560&h=315]

Flattr this!

The Genesis of The School of Data Fellowship

- July 20, 2017 in Fellowship

In 2013, data literacy was, and in many ways remains, a nascent field. Unsurprisingly, finding reliable trainers to carry out School of Data missions around the world was a struggle. We started our Fellowship programme, as a way to address the lack of data literacy trainers throughout the world. Even in 2013, it was clear that while short term data trainings were effective at raising awareness of potential uses of data for storying telling and advocacy, more long term interventions were required to actually build data skills in civil society and the media. We designed the School of Data Fellowship to address these two primary challenges that we had identified and were regularly confronting during the course of our work:

  1. there is a severe shortage of data trainers able to work with local communities and adapt training to local needs and/or languages.
  2. organisations and individuals need to engage with data over a long period of time for data activities to become embedded within their work.

Building the foundations

Our Fellowships are nine-month placements with School of Data for existing data-literacy practitioners. We identify high potential individuals with topical expertise and help them mature as data literacy leaders by working alongside School of Data and our global network. At the start of the Fellowship, we create an individualised programme with each Fellow, designed to equip them with the skills they need to more effectively further data literacy in their community. This programme is built around the core competencies required for furthering data literacy: community building; content creation; and knowledge transfer (see Data Literacy Activity Matrix) for more details on these competencies).

From the outset, we were successful at recruiting high-potential individuals to participate in the programme and throughout the years the applicant pool has only grown. We have worked with the Fellows to adapt and translate materials, develop original learning content and provide training to local civil society. Each year, we make tweaks in the programme to reflect learnings both from where we are achieving our goals as well as where we have fallen short.

An evolving process

Over the years, we have fine-tuned the goals of the programme to reflect what we have found the Fellowship programme to be most effective at achieving as well as what is needed to advance data literacy. These goals are as follows:

  1. identify, train and support individuals who have the potential to become data leaders and sources of expertise in their country and/region;

  2. kickstart, or strengthen, data literacy communities in the countries where current and former Fellows are active

Prior to 2016, we had not clearly articulated that kickstarting data literacy communities was one of the goals of the Fellowship programme but it had become obvious that this was a critical component to the sustainability of our work. Given that data literacy is such a nascent field, it was always important, in each new city/country, for the Fellows to do substantial awareness raising work. The Fellows who were most successful would provide trainings and organise meet-ups not necessarily to build individual skills but to start sensitising local communities to the idea that data is a powerful tool for civil society.

A successful approach

In late 2016, we conducted interviews with two dozen School of Data Fellows to better understand whether we were achieving our goals as a programme. These interviews formed the basis of our first Fellowship Outcomes Mapping. Some of the highlights of these interviews can be found below.

The Fellows:

We found that the Fellowship has been successful in achieving its initial goal, creating a community of qualified local trainers knowledgeable in School of Data methodologies and actively spreading data literacy in their respective countries:

  1. Better Understanding of the Data Needs and Challenges of Civil Society: Over the years, we have recruited a number of developers, data analysts and entrepreneurs, who, prior to the Fellowship, had little understanding of the specific challenges faced by civil society in using data. Through working with local NGOs, governments and newsrooms, these Fellows gained an understanding of how they could use their skills to serve civil society more effectively.

  2. New Methodologies & Approaches for Training: Through the Fellowship programme, Fellows were able to tap into a network of data literacy practitioners and learn from the best about how to build an effective training programme for any audience.

  3. International Visibility & Connections: Finally, through the School of Data programme, Fellows were introduced to an international community, increasing both the visibility of their work and providing them with a number of new and exciting opportunities to train and to be recruited for consultancies and jobs. Fellows have gone on to work for large newsrooms, international organisations, development agencies and governments.

The local communities

In addition to supporting Fellows to achieve their own goals and personal development, the Fellowship programme also seeks to strengthen data literacy within local civil society. The potential of the Fellowship to have a meaningful impact on local civil society groups was formally acknowledged in 2016, with the inclusion of a specific programmatic goal relating to community-building. As seen in School of Data’s research on the value of different formats of data literacy activities, the Fellowship format is most successful in achieving outcomes related to awareness-building (understanding of data uses, awareness of data skill gaps, knowledge of the data pipeline) as well as the kickstarting of data-related activities locally.

This awareness raising work is required in every sector. It is not necessarily because there is an emerging data community focused on transparency and accountability in public finance or extractives that the local health or water CSOs will be sold on the idea of integrating more data into their work. To reflect these learnings, in 2016, we started recruiting Fellows with a particular topical interest or expertise who would work on data literacy in that specific sector.

Next Steps

We are continuously working to improve the Fellowship process and are overjoyed most of our past Fellows go on to become active members of the School of Data network. Over the next few months, we will be posting a series of articles about the Fellowship programme including:

  • Steps we have taken to ensure diversity in each Fellowship class as well as the challenges we still face in terms of inclusivity
  • Funding the low-visibility infrastructure-building work that is a critical part of the Fellowship process
  • How and where we have struggled to make the Fellowship model work and plan we have for changing that

We welcome any thoughts and feedback that you have. Get in touch on twitter @schoolofdata or via our contact page.

Flattr this!

Data is a Team Sport: One on One with Heather Leson

- July 19, 2017 in Community, Data Blog, Event report

Data is a Team Sport is a series of online conversations held with data literacy practitioners in mid-2017 that explores the ever evolving data literacy eco-system.

To subscribe to the podcast series, cut and paste the following link into your podcast manager : http://feeds.soundcloud.com/users/soundcloud:users:311573348/sounds.rss or find us in the iTunes Store and Stitcher.

[soundcloud url=”https://api.soundcloud.com/tracks/333725312″ params=”color=ff5500&auto_play=false&hide_related=false&show_comments=true&show_user=true&show_reposts=false” width=”100%” height=”166″ iframe=”true” /]

This episode features a one on one conversation with Heather Leson, the Data Literacy Lead at International Federation of Red Cross and Red Crescent Societies where her mandate includes global data advocacy, data literacy and data training programs in partnership with the 190 national societies and the 13 million volunteers. She is a past Board Member at the Humanitarian OpenStreetMap Team (4 years), Peace Geeks (1 year), and an Advisor for MapSwipe – using gamification systems to crowdsource disaster-based satellite imagery. Previously, she worked as Social Innovation Program Manager, Qatar Computing Research Institute (Qatar Foundation) Director of Community Engagement, Ushahidi, and Community Director, Open Knowledge (School of Data).

Notes from the Conversation:

Heather talked about the need for Humanitarian organisations to lead their data projects with a ‘do no harm’ approach, and how keeping data and individual information they collect safe was paramount. During her first 10 months developing a data literacy program for the Federation, she focused on identifying internal expertise and providing opportunities for peer exchange. She has relied heavily on external knowledge, expertise and resources that have been shared amongst data literacy practitioners through participating in networks and communities such as School of Data.

Heather’s Resources

Blogs/websites

Heather’s work

The full online conversation:
[youtube https://www.youtube.com/watch?v=Vq7JJE_U7sg&w=560&h=315]

Flattr this!

Rethinking data literacy: how useful is your 2-day training?

- July 14, 2017 in Research

As of July 2017, School of Data’s network includes 14 organisations around the world which collectively participate to organise hundreds of data literacy events every year. The success of this network-based strategy did not come naturally: we had to rethink and move away from our MOOC-like strategy in 2013 in order to be more relevant to the journalists and civil society organisations we intend to reach.

In 2016 we did the same for our actual events.

The downside of short-term events

Prominent civic tech members have long complained about the ineffectiveness of hackathons to build long-lasting solutions for the problems they intended to tackle. Yet various reasons have kept the hackathon popular: it’s short-term, can produce decent-looking prototypes, and is well-known even beyond civic tech circles.

The above stays true for the data literacy movement and its most common short-term events: meetups, data and drinks, one-day trainings, two-day workshops… they’re easy to run, fund and promote: what’s not to love?

Well, we’ve never really been satisfied with the outcomes we saw of these events, especially for our flagship programme, the Fellowship, which we monitor very closely and aim to improve every year. Following several rounds of surveys and interviews with members of the School of Data network, we were able to pinpoint the issue: our expectations and the actual value of these events are mismatched, leading us not to take critical actions that would multiply the value of these events.

The Data Literacy Activity Matrix

To clarify our findings, we put the most common interventions (not all of them are events, strictly speaking) in a matrix, highlighting our key finding that duration is a crucial variable. And this makes sense for several reasons:

  • Fewer people can participate in a longer event, but those who can are generally more committed to the event’s goals

  • Longer events have much more time to develop their content and explore the nuances of it

  • Especially in the field of data literacy, which is focused on capacity building, time and repetition are key to positive outcomes

Data Literacy Activity Matrix

(the categories used to group event formats are based on our current thinking of what makes a data literacy leader: it underpins the design of our Fellowship programme.)

Useful for what?

The matrix allowed us to think critically about the added value of each subcategory of intervention. What is the effective impact of an organisation doing mostly short-term training events compared to another one focusing on long-term content creation? Drawing again from the interviews we’ve done and some analysis of the rare post-intervention surveys and reports we could access (another weakness of the field), we came to the following conclusions:

  • very short-term and short-term activities are mostly valuable for awareness-raising and community-building.

  • real skill-building happens through medium to long-term interventions

  • content creation is best focused on supporting skill-building interventions and data-driven projects (rather than hoping that people come to your content and learn by themselves)

  • data-driven projects (run in collaboration with your beneficiaries) are the ones creating the clearest impact (but not necessarily the longest lasting).

Data Literacy Matrix - Value Added

It is important, though, not to set short-term and long-term interventions in opposition. Not only can the difference be fuzzy (a long term intervention can be a series of regular, linked, short term events, for example) but both play roles of critical importance: who is going to apply to a data training if people are not aware of the importance of data? Conversely, recognising the specific added value of each intervention requires also to act in consequence: we advise against organising short-term events without establishing a community engagement strategy to sustain the event’s momentum.

In hindsight, all of the above may sound obvious. But it mostly is relevant from the perspective of the beneficiary. Coming from the point of the view of the organisation running a data literacy programme, the benefit/cost is defined differently.

For example, short-term interventions are a great way to find one’s audience, get new trainers to find their voice, and generate press cheaply. Meanwhile, long-term interventions are costly and their outcomes are harder to measure: is it really worth it to focus on training only 10 people for several months, when the same financial investment can bring hundreds of people to one-day workshops? Even when the organisation can see the benefits, their funders may not. In a field where sustainability is still a complicated issue many organisations face, long-term actions are not a priority.

Next steps

School of Data has taken steps to apply these learnings to its programmes.

  • The Curriculum programme, which initially focused on the production and maintenance of online content available on our website has been expanded to include offline trainings during our annual event, the Summer Camp, and online skillshares throughout the year;

  • Our recommendations to members regarding their interventions systematically refer to the data literacy matrix in order for them to understand the added value of their work;

  • Our Data Expert programme has been designed to include both data-driven project work and medium-term training of beneficiaries, differentiating it further from straightforward consultancy work.

We have also identified three directions in which we can research this topic further:

  • Mapping existing interventions: the number, variety and quality of data literacy interventions is increasing every year, but so far no effort has been made to map them, in order to identify the strengths and gaps of the field.

  • Investigating individual subgroups: the matrix is a good starting point for interrogating best practices and concrete outcomes in each of the subgroups, in order to provide more granular recommendations to the actors of the field and the designing of new intervention models.

  • Exploring thematic relevance: the audience, goals and constraints of, say, data journalism interventions, differ substantially from those of the interventions undertaken within the extractives data community. Further research would be useful to see how they differ to develop topic-relevant recommendations.

Flattr this!

Data is a Team Sport: Advocacy Organisations

- July 12, 2017 in Community, Data Blog, Event report

Data is a Team Sport is our open-research project exploring the data literacy eco-system and how it is evolving in the wake of post-fact, fake news and data-driven confusion.  We are producing a series of videos, blog posts and podcasts based on a series of online conversations we are having with data literacy practitioners.

To subscribe to the podcast series, cut and paste the following link into your podcast manager : http://feeds.soundcloud.com/users/soundcloud:users:311573348/sounds.rss or find us in the iTunes Store and Stitcher.

[soundcloud url=”https://api.soundcloud.com/tracks/332772865″ params=”color=ff5500&auto_play=false&hide_related=false&show_comments=true&show_user=true&show_reposts=false” width=”100%” height=”166″ iframe=”true” /]

In this episode we discussed data driven advocacy organisations with:

  • Milena Marin is Senior Innovation Campaigner at Amnesty International. She is currently leads Amnesty Decoders – an innovative project aiming to engage digital volunteers in documenting human right violations using new technologies. Previously she worked as programme manager of School of Data. She also worked for over 4 years with Transparency International where she supported TI’s global network to use technology in the fight against corruption.
  • Sam Leon, is Data Lead at Global Witness, focusing on the use of data to fight corruption and how to turn this information into change making stories. He is currently working with a coalition of data scientists, academics and investigative journalists to build analytical models and tools that enable anti-corruption campaigners to understand and identify corporate networks used for nefarious and corrupt practices.

Notes from the Conversation

In order to get their organisations to see the value and benefit of using data, they both have had to demonstrate results and have looked for opportunities where they could show effective impact. Advocates are often quick to see data and new technologies as easy answers to their challenges, yet have difficulty in foreseeing the realities of implementing complex projects that utilise them.

Data provides advocates with ways to reveal the extent of a problem and  provide depth to qualitative and individual stories.  Milena credits the work of School of Data for the fact that journalists now expect Amnesty to back up their stories with data. However, the term ‘fake news’ is used to discredit their work and as a result they work harder at presenting verifiable data.

Data projects also can provide additional benefit to advocacy organisations by engaging stakeholders. Amnesty’s decoder project has involved 45,000 volunteers, and along with being able to extract data from a huge amount of video, it has also provided those volunteers with a deeper understanding of Amnesty’s work.  Global Witness is striving to make their data publicly accessible so it can provide benefit to their allies. Global Witness acknowledges that they are still are learning about ethical and privacy considerations before open data-sets can be a default. Both organisations are actively learning

They also touched on how important it is for their organisations to learn from others. They  look to external consultants and intermediaries to help fill organisational gaps in expertise in using data. They find it critical for organisations like Open Knowledge and School of Data to convene practitioners from different disciplines to share methodologies and lessons learned. During the conversation, they offered to share their own internal curriculums with each other.

More about their work

Milena

Sam

Resources and Readings

From FabRiders

View the Full Conversation:
[youtube https://www.youtube.com/watch?v=Row0OtRhlao]

 

Flattr this!

Data is a Team Sport: One on One with Daniela Lepiz

- July 3, 2017 in Community, Data Blog, Event report

Data is a Team Sport is a series of online conversations held with data literacy practitioners in mid-2017 that explores the ever evolving data literacy eco-system.

To subscribe to the podcast series, cut and paste the following link into your podcast manager : http://feeds.soundcloud.com/users/soundcloud:users:311573348/sounds.rss or find us in the iTunes Store and Stitcher.

[soundcloud url=”https://api.soundcloud.com/tracks/331054739″ params=”color=ff5500&auto_play=false&hide_related=false&show_comments=true&show_user=true&show_reposts=false” width=”100%” height=”166″ iframe=”true” /]

This episode features a one on one conversation with Daniela Lepiz, a Costa Rican data journalist and trainer, who is currently the Investigation Editor for CENOZO, a West African Investigative Journalism Project that aims to promote and support cross border data investigation and open data in the region. She has a masters degree in data journalism from the Rey Juan Carlos University in Madrid, Spain. Previously involved with OpenUP South Africa working with journalists to produce data driven stories.  Daniela is also a trainer for the Tanzania Media Foundation and has been involved in many other projects with South African Media, La Nacion in Costa Rica and other international organisations.

Notes from the conversation

Daniela spoke to us from Burkina Faso and reflected on the importance of data-driven journalism in holding power to accountability. Her project aims to train and support  journalists working across borders in West Africa to use data to expose corruption and human rights violation. To identify journalists to participate in the project, they seek individuals who are experienced, passionate and curious. The project engages media houses, such as Premium Times in Nigeria, to ensure that there are respected outlets to publish their stories. Daniela raised the following points:

  • As the media landscape continues to evolve, data literacy is increasing becoming a required competency
  • Journalists do not necessarily have a background in mathematics or statistics and are often intimidated by the idea of having to these concepts in their stories.
  • Data stories are best done in teams of people with complementary skills. This can go against a traditional approach to journalism in which journalists work alone and tightly guard their sources.
  • It is important that data training programmes also work with, and better understand the needs of journalists.

Resources she finds inspiring

Her blogs posts

The full online conversation:

[youtube https://www.youtube.com/watch?v=9l4SI6lm130]

Daniela’s bookmarks!

These are the resources she uses the most often.

.Rddj – Resources for doing data journalism with RComparing Columns in Google Refine | OUseful.Info, the blog…Journalist datastores: where can you find them? A list. | Simon RogersAidInfoPlus – Mastering Aid Information for Change

Data skills

Mapping tip: how to convert and filter KML into a list with Open Refine | Online Journalism Blog
Mapbox + Weather Data
Encryption, Journalism and Free Expression | The Mozilla Blog
Data cleaning with Regular Expressions (NICAR) – Google Docs
NICAR 2016 Links and Tips – Google Docs
Teaching Data Journalism: A Survey & Model Curricula | Global Investigative Journalism Network
Data bulletproofing tips for NICAR 2016 – Google Docs
Using the command line tabula extractor tool · tabulapdf/tabula-extractor Wiki · GitHub
Talend Downloads

Github

Git Concepts – SmartGit (Latest/Preview) – Confluence
GitHub For Beginners: Don’t Get Scared, Get Started – ReadWrite
Kartograph.org
LittleSis – Profiling the powers that be

Tableau customized polygons

How can I create a filled map with custom polygons in Tableau given point data? – Stack Overflow
Using Shape Files for Boundaries in Tableau | The Last Data Bender
How to make custom Tableau maps
How to map geographies in Tableau that are not built in to the product (e.g. UK postcodes, sales areas) – Dabbling with Data
Alteryx Analytics Gallery | Public Gallery
TableauShapeMaker – Adding custom shapes to Tableau maps | Vishful thinking…
Creating Tableau Polygons from ArcGIS Shapefiles | Tableau Software
Creating Polygon-Shaded Maps | Tableau Software
Tool to Convert ArcGIS Shapefiles into Tableau Polygons | Tableau and Behold!
Polygon Maps | Tableau Software
Modeling April 2016
5 Tips for Making Your Tableau Public Viz Go Viral | Tableau Public
Google News Lab
HTML and CSS
Open Semantic Search: Your own search engine for documents, images, tables, files, intranet & news
Spatial Data Download | DIVA-GIS
Linkurious – Linkurious – Understand the connections in your data
Apache Solr –
Apache Tika – Apache Tika
Neo4j Graph Database: Unlock the Value of Data Relationships
SQL: Table Transformation | Codecademy
dc.js – Dimensional Charting Javascript Library
The People and the Technology Behind the Panama Papers | Global Investigative Journalism Network
How to convert XLS file to CSV in Command Line [Linux]
Intro to SQL (IRE 2016) · GitHub
Malik Singleton – SELECT needle FROM haystack;
Investigative Reporters and Editors | Tipsheets and links
Investigative Reporters and Editors | Tipsheets and Links

SQL_PYTHON

More data

2016-NICAR-Adv-SQL/SQL_queries.md at master · taggartk/2016-NICAR-Adv-SQL · GitHub
advanced-sql-nicar15/stats-functions.sql at master · anthonydb/advanced-sql-nicar15 · GitHub
2016-NICAR-Adv-SQL/SQL_queries.md at master · taggartk/2016-NICAR-Adv-SQL · GitHub
Malik Singleton – SELECT needle FROM haystack;
Statistical functions in MySQL • Code is poetry
Data Analysis Using SQL and Excel – Gordon S. Linoff – Google Books
Using PROC SQL to Find Uncommon Observations Between 2 Data Sets in SAS | The Chemical Statistician
mysql – Query to compare two subsets of data from the same table? – Database Administrators Stack Exchange
sql – How to add “weights” to a MySQL table and select random values according to these? – Stack Overflow
sql – Fast mysql random weighted choice on big database – Stack Overflow
php – MySQL: Select Random Entry, but Weight Towards Certain Entries – Stack Overflow
MySQL Moving average
Calculating descriptive statistics in MySQL | codediesel
Problem-Solving using Graph Traversals: Searching, Scoring, Ranking, …
R, MySQL, LM and quantreg
26318_AllText_Print.pdf
ddi-documentation-english-572 (1).pdf
Categorical Data — pandas 0.18.1+143.g3b75e03.dirty documentation
python – Loading STATA file: Categorial values must be unique – Stack Overflow
Using the CSV module in Python
14.1. csv — CSV File Reading and Writing — Python 3.5.2rc1 documentation
csvsql — csvkit 0.9.1 documentation
weight samples with python – Google Search
python – Weighted choice short and simple – Stack Overflow
7.1. string — Common string operations — Python v2.6.9 documentation
Introduction to Data Analysis with Python | Lynda.com
A Complete Tutorial to Learn Data Science with Python from Scratch
GitHub – fonnesbeck/statistical-analysis-python-tutorial: Statistical Data Analysis in Python
Verifying the email – Email Checker
A little tour of aleph, a data search tool for reporters – pudo.org (Friedrich Lindenberg)
Welcome – Investigative Dashboard Search
Investigative Dashboard
Working with CSVs on the Command Line
FiveThirtyEight’s data journalism workflow with R | useR! 2016 international R User conference | Channel 9
Six issue when installing package · Issue #3165 · pypa/pip · GitHub
python – Installing pip on Mac OS X – Stack Overflow
Source – Journalism Code, Context & Community – A project by Knight-Mozilla OpenNews
Introducing Kaggle’s Open Data Platform
NASA just made all the scientific research it funds available for free – ScienceAlert
District council code list | Statistics South Africa
How-to: Index Scanned PDFs at Scale Using Fewer Than 50 Lines of Code – Cloudera Engineering Blog
GitHub – gavinr/geojson-csv-join: A script to take a GeoJSON file, and JOIN data onto that file from a CSV file.
7 command-line tools for data science
Python Basics: Lists, Dictionaries, & Booleans
Jupyter Notebook Viewer

PYTHON FOR JOURNALISTS

New folder

Reshaping and Pivot Tables — pandas 0.18.1 documentation
Reshaping in Pandas – Pivot, Pivot-Table, Stack and Unstack explained with Pictures – Nikolay Grozev
Pandas Pivot-Table Example – YouTube
pandas.pivot_table — pandas 0.18.1 documentation
Pandas Pivot Table Explained – Practical Business Python
Pivot Tables In Pandas – Python
Pandas .groupby(), Lambda Functions, & Pivot Tables
Counting Values & Basic Plotting in Python
Creating Pandas DataFrames & Selecting Data
Filtering Data in Python with Boolean Indexes
Deriving New Columns & Defining Python Functions
Python Histograms, Box Plots, & Distributions
Resources for Further Learning
Python Methods, Functions, & Libraries
Python Basics: Lists, Dictionaries, & Booleans
Real-world Python for data-crunching journalists | TrendCT
Cookbook — agate 1.4.0 documentation
3. Power tools — csvkit 0.9.1 documentation
Tutorial — csvkit 0.9.1 documentation
4. Going elsewhere with your data — csvkit 0.9.1 documentation
2. Examining the data — csvkit 0.9.1 documentation
A Complete Tutorial to Learn Data Science with Python from Scratch
For Journalism
ProPublica Summer Data Institute
Percentage of vote change | CARTO
Data Science | Coursera
Data journalism training materials
Pythex: a Python regular expression editor
A secure whistleblowing platform for African media | afriLEAKS
PDFUnlock! – Unlock secured PDF files online for free.
The digital journalist’s toolbox: mapping | IJNet
Bulletproof Data Journalism – Course – LEARNO
Transpose columns across rows (grefine 2.5) ~ RefinePro Knowledge Base for OpenRefine
Installing NLTK — NLTK 3.0 documentation
1. Language Processing and Python
Visualize any Text as a Network – Textexture
10 tools that can help data journalists do better work, be more efficient – Poynter
Workshop Attendance
Clustering In Depth · OpenRefine/OpenRefine Wiki · GitHub
Regression analysis using Python
DataBasic.io
DataBasic.io
R for Every Survey Analysis – YouTube
Git – Book
NICAR17 Slides, Links & Tutorials #NICAR17 // Ricochet by Chrys Wu
Register for Anonymous VPN Services | PIA Services
The Bureau of Investigative Journalism
dtSearch – Text Retrieval / Full Text Search Engine
Investigation, Cybersecurity, Information Governance and eDiscovery Software | Nuix
How we built the Offshore Leaks Database | International Consortium of Investigative Journalists
Liz Telecom/Azimmo – Google Search
First Python Notebook — First Python Notebook 1.0 documentation
GitHub – JasonKessler/scattertext: Beautiful visualizations of how language differs among document types

 

Flattr this!

Announcing our new member: ‘Caribbean School of Data’

- June 21, 2017 in Announcement, Community

Today we’re delighted to welcome a new organisational member to our network: the Caribbean Open Institute! They will carry the Caribbean School of Data Initiative.

The new Caribbean initiative is led by Maurice McNaughton who coordinates the Caribbean Open Institute, as the regional node for the Open Data for Development network activities in the Caribbean. The COI coalition of partner organisations and individuals conduct regional open data research, advocacy, and capacity-building activities such as the Global Open Data Index and the Open Data Barometer. The new “Caribbean School of Data” will be hosted at the Mona School of Business & Management, UWI and affiliate institutions are planned for other countries across the Caribbean (including Trinidad & Tobago,  Haiti, Cuba and Guyana).

Already in the group’s pipeline is a virtual incubation model to encourage and facilitate data-driven entrepreneurial startups as well as a project to build a Caribbean data competency map, to identify and make searchable and accessible, individual and institutional clusters of data skills, knowledge and capabilities in the region.

School of Data is already working with the Caribbean Open Institute on a Data literacy project in Haïti called “Going Global: Digital Jobs and Gender” for which we have recently recruited two Fellows.

Welcome, Caribbean School of Data!

 

About School of Data members

School of Data’s organisational members are legally independent groups, affiliated formally through a memorandum of understanding. Our members are groups whose mission and activities are aligned with ours and with whom we plan to collaborate in this data literacy work. Caribbean School of Data  is our fourteenth member!

 

Flattr this!