You are browsing the archive for Sam Leon.

Forbes Philippines & BlogWatch win best story award as Data Journalism PH wraps up

- December 15, 2015 in Uncategorized

At the end of November, Open Knowledge, School of Data and the Philippine Center for Investigative Journalism (PCIJ) wrapped-up their six-month data journalism training for media organisations in the Philippines, the first of its kind.

Over 100 journalists and civil servants gathered at the Cocoon Hotel in Quezon City to see the twelve participating media teams present their work and listen to keynotes from The Guardian’s Caelainn Barr, Undersecretary Richard Moya (Open Data Task Force Philippines), Kenneth Abante (Department of Finance) and Rogier Van Den Brink (World Bank) on the interplay between government open data and public integrity journalism.

2015-11-27 12.01.58

Kenneth Abante from the Department of Finance speaking at the wrap-up event of Data Journalism PH 2015

The World Bank funded programme equipped participating newsrooms with the tools and techniques for mining the ever increasing volumes of public data being published by Philippine government departments via their national government data portal, After an initial intensive three-day training in July 2015 the teams received regular remote training sessions on data skills from Open Knowledge and editorial support from PCIJ as they progressed with their proposed data stories. Teams worked on diverse topics from probing who really benefits from the the Philippines’ Bottom Up Budgeting initiative to following where money allocated to the reconstruction effort after Typhoon Yolanda actually went.

Five of the twelve participating teams were able to publish their stories before the event with a number of teams finalising their articles for print publications in the new year. Forbes Philippines and BlogWatch were awarded prizes for best story by PCIJ and Open Knowledge based on the originality of their stories, their approach to data collection and the strength of their narrative. Forbes Philippines collected data from the SEC on independent directors and correlated this with company performance to give a unique view on corporate accountability in the Philippines. BlogWatch persevered with a range of large publicly available datasets on aid and reconstruction. The team also took to social media to crowdsource information that was missing in order to follow the money that was plugged into various projects in the wake of the devastation caused by Typhoon Yolanda.

2015-11-27 16.06.08 copy

Winning teams BlogWatch (Jane Uymatiao & Noemi Lardizabal-Dado) and Forbes Philippines (Lala Rimando & Lorenxo Subido) with Sam Leon (Open Knowledge) and Malou Mangahas (PCIJ)

Philippine Star ran an analysis of data published by the Department for Education on how many new schools were being built that would not have access to electricity and water. Business World looked at new trends in investment amongst Filipino citizens and summarised their results in an infographic. Calbayog Post investigated how projects approved under the Bottom Up Budgeting scheme in Samar Province had performed. The Financial Times produced a visual slideshow on the Philippines’ dependence on renewables and the opportunities for hydro power using data published by the Department of Energy. You can read the published stories below with the exception of Forbes Philippines’ which will be published in their January 2016 editions. Other participating teams including Rappler, PCIJ, Interaksyon, ABS-CBN, Bloomberg, Inquirer were not able to publish in time for the deadline, but hope to publish their stories in the coming weeks.

The Philippines has made substantial progress in recent years in government transparency. Launching a national government open data portal in 2014 and setting up an Open Data Task Force within the civil service to catalyse further open data releases across national and local government. The programme demonstrated the promise of open data by enabling participating journalists to shed light on issues of critical national importance to the broader public. It also put into sharp focus areas that needed more work from publishing government departments. Too many critical datasets were incomplete, not maintained actively, contained inconsistencies that made them difficult to analyse and were not available for free.

A selection of the online tutorials, data recipes and training material have been made available for all to use on the project website including guides to using a range of tools such as, CartoDB, and DocumentCloud.

###Data-driven articles produced as part of the programme

Forbes Philippines and BlogWatch were awarded prizes for best story by PCIJ and Open Knowledge based on the originality of their stories, their approach to data collection and the strength of their narrative.

###Testimonials from some of the participant journalists

“The workshop allowed me to be braver in pursuing irregularities and anomalies with the help of data, but also to be careful in making conclusions. It’s a nice intro to data journalism.” – Michael Joseph Bueza, Rappler

“Data Journalism PH 2015 is a workshop every serious journalist should take. More than teaching me practical skills – how to create maps, infographics, and spreadsheets – it made me realize how important it is to use hard facts, as opposed to merely relying on statements, to create a public that is more informed and more critical.” – Patricia Aquino, Interaksyon

“A great program! PCIJ has always set the standard for investigative journalism. Open Knowledge did a great job teaching us data journalism and the different skills it requires. Looking forward to continuing to work with the whole PCIJ team. The reports of the other teams were informative as well. Thumbs up to the whole group.” – Nestor Corrales, Inquirer

“The best journalism program ever that united my writing and analysis skills.” – Rommel Rutor, Calbayog Post

“It was a great chance to know how and why consumers and processors of data (i.e. the journalists) are seeking more from the producers of data (government, private sector). “ – Lala Rimando, Forbes Philippines

“It was a great opportunity to be part of this programme. I’m really interested in improving my technical and editorial skills on data mining and thanks to PCIJ and Open Knowledge, I’ve really learned a lot.” – Kia Obang, BusinessWorld Publishing

“Open data seems like an issue reserved for a select number of people, but it’s a subject that so many people need to be familiar with. Learning more about it through the programme can give you the right tools to turn open data into powerful analysis.” – Kyle Subido, Forbes Philippines

“If you want to learn how to dig into huge data, you gotta take this training. “ – Jose Gerwin Babob, Calbayog Post

“As a reporter covering different beats, including survey results, the training helped me learn new skills that would make me more effective in writing investigative stories by correctly analyzing and interpreting datasets. I really appreciate the efforts of the PCIJ and the Open Knowledge in coming up with such training for journalists like us. This will be a very good addition to our resumes. :-)” – Helen Flores, Philippine Star

“Had a grand time. Learned about the existence of free online tools which could potentially take off about 25% of my previous workload.” – Dan Paurom, Inquirer

“The Data Journalism Philippines 2015 is a timely program for Filipino journalists who are interested in making sense of the huge amount of data that are readily available online. The things that we have learned during the duration of the program equipped us with the necessary skills needed to produce quality data-driven articles for our respective organizations.” – Jan Victor Mateo, Philippine Star

Flattr this!

Call for applications for Data Journalism Philippines 2015

- May 27, 2015 in Uncategorized

Screen Shot 2015-05-27 at 08.14.32

The Open Knowledge Foundation in partnership with the Philippine Center for Investigative Journalism is pleased to announce the launch of Data Journalism Ph 2015. Supported by the World Bank, the program will train journalists and citizen media in producing high-quality, data-driven stories.

In recent years, government and multilateral agencies in the Philippines have published large amounts of data such as the government’s recently launched Open Data platform. These were accompanied by other platforms that track the implementation and expenditure of flagship programs such as Bottom-Up-Budgeting via, Infrastructure via and reconstruction platforms including the Foreign Aid Transparency Hub. The training aims to encourage more journalists to use these and other online resources to produce compelling investigative stories.

Data Journalism Ph 2015 will train journalists on the tools and techniques required to gain and communicate insight from public data, including web scraping, database analysis and interactive visualization. The program will support journalists in using data to back their stories, which will be published by their media organization over a period of five months.

Participating teams will benefit from the following:

  • A 3-day data journalism training workshop by the Open Knowledge Foundation and PCIJ in July 2015 in Manila
  • A series of online tutorials on a variety of topics from digital security to online mapping
  • Technical support in developing interactive visual content to accompany their published stories

##Apply now!

Teams of up to three members working with the same print, TV, or online media agencies in the Philippines are invited to submit an application here.

Participants will be selected on the basis of the data story projects they pitch focused on key datasets including infrastructure, reconstruction, participatory budgeting, procurement and customs. Through Data Journalism Ph 2015 and its trainers, these projects will be developed into data stories to be published by the participants’ media organizations.

Join the launch

Open Knowledge and PCIJ will host a half-day public event for those interested in the program in July in Quezon City. If you would like to receive full details about the event, please sign up here.

To follow the programme as it progresses go to the Data Journalism 2015 Ph project website.

Flattr this!

Digital Methods Initiative Winter School, University of Amsterdam

- January 27, 2015 in Events

Exploding book in the pulpit of the De Krijtberg Church in Spui, Amsterdam where some of the sprint took place

Exploding book in the pulpit of the Algemene Doopsgezinde Sociëteit in Spui, Amsterdam where some of the sprint took place

Last week I attended the 7th annual Winter School at Amsterdam University. Run by the Digital Methods Initiative, it took the form of a data sprint in which students joined professional developers and designers to answer research questions using social media data.

The DMI group at Amsterdam have developed and collated a suite of easy-to-use tools specifically for this kind of research. They are well worth checking out for anyone interested in this field and they cover a range of techniques from web scraping to list triangulation, and can be found online here.

I joined a group looking at bias across three APIs through which you can acquire Twitter data: the Search API, the Stream API and the proprietary Firehose endpoint – generally regarded as the most complete source of Twitter data. We had three sets captured from the three separate APIs for a critical period between 7th and 15th October 2014 when the Hong Kong protests were taking place.

Other groups took on a range of tasks from mapping the open data revolution to tracking the global climate change debate. All projects deployed a range of data wrangling techniques to answer these complex social, political and cultural phenomena.

A few things I learned:

  • Anyone wanting to use social media data to answer research questions about society and culture needs more than just spreadsheet skills. These datasets are generally larger than what Excel can comfortably handle, so basic database skills are a massive help.
  • Off-the-shelf tools for data analysis are brilliant, but often one needs to tweak lines of enquiry to your specific research question. Having some knowledge of programming means that you can take a much more flexible approach then when relying on the GUI tools.
  • Working in such a collaborative fast-paced environment meant that reproducibility (ie. where different parts of the team would re-use scripts and code developed by other parts of the team) was essential, alongside creating documentation on the fly. We found iPython notebooks especially useful for this, whereas analytical steps taken in Excel were harder to reproduce.
  • Free Twitter data – like that which can be acquired from the Search and Stream API – is still good, and sometimes better than that which you get through the proprietary APIs. When investigating online reactions to contentious and controversial events – such as the Hong Kong protests – tweets will inevitably be removed both by users and Twitter. If you want to get the full story, it’s far better to scape data as it comes in through the streaming API.
  • We’ve written about it before on this blog but the Pandas module for Python is brilliant for data wrangling and analysis and well worth getting to know if you plan on working with big datasets. It’s quick, flexible and powerful.
  • Nothing beats hands-on learning when it comes to technical skills. Having a motivating research question and some real life data is the best way to learn how to use the multitude of tools now at any budding data wranglers disposal. I learnt more in a week than I could have in months reading about tools and languages in the abstract!

For those interested in attending a DMI school in the future – take a look at the summer school coming later in 2015.

Flattr this!

Uncovering Asia and Data Journalism in the Philippines

- November 28, 2014 in Events

Last weekend I went to Manila to attend Asia’s first international conference for investigative journalists, Uncovering Asia. Run by the Global Investigative Journalism Network and the Philippines Centre for Investigative Journalism (PCIJ), the event brought together over 200 journalists from countries across Asia and the world.

Sheila Coronel giving her keynote at the conference. A [transcript of her talk]( has been published on Rappler.

Sheila Coronel giving her keynote at the conference. A [transcript of her talk]( has been published on Rappler.

The opening keynote was given by Sheila Coronel, Professor of Investigative Journalism at Columbia Journalism School and co-founder of PCIJ. Coronel rose to prominence in the Philippines as a journalist for the magazine Panorama, she widely reported on the human rights abuses of the Marcos dictatorship in its final years. Her talk, entitled 9 billion eyes: Holding power to account in the world’s largest continent, painted a picture of Asia in which conditions were generally improving for investigative journalists, but in which there was more need than ever for a vibrant fourth estate to combat the abuse of power and corruption.

Coronel’s opening talk covered a range of corruption scandals and the way they affect people’s lives in Asia. She referred to the environmental damage wrought by egregious logging companies in Malaysia, as well as the poorly constructed Sichuanese buildings that crumbled in the earthquake of 2008 killing thousands of people. In the Philippines, she mentioned the botched public road projects that hamper the ability of farmers to move their goods around the country, as well as a shortage of textbooks in classrooms that prevent many Philippine children from getting the education which they’re entitled to.

Coronel also spoke of the factors that continued to inhibit the development of a vibrant fourth estate in many Asian countries. These included government gagging laws, like those passed by the Japanese government in the aftermath of the Fukushima disaster, but also the social backlash faced by journalists that publish on sensitive topics, such as Islam in Indonesia. The physical and legal risks that investigative journalists in the Philippines expose themselves to was underlined by the fact that the first day of the conference took place on the 5 year anniversary of the Amputuan Massacre in which the single largest killing of journalists took place.

Vigil for the victims of the Amputuan Massacre on the first evening of the conference in Quezon City, Philippines.

Vigil for the victims of the Amputuan Massacre on the first evening of the conference in Quezon City, Philippines.

Despite the seriousness of the challenges still faced within many Asian newsrooms, the mood of Coronel’s opening keynote and of the conference more generally was optimistic. One of the major threads of the conference was the opportunity for investigative journalism that exploits technology and public data. In the wake of international initiatives like the Open Government Partnership, most Asian journalists now find themselves with an ever expanding quantity of public interest open data. The Philippines, for instance, unveiled its open data portal earlier this year which contains a wide range of data on government spending, procurement and reconstruction – topics that have long been the subject of corruption investigations by the likes of PCIJ but about which information has traditionally been patchy and scarce.

Credit goes to the organisers who developed a brilliant data track that gave participants an opportunity to learn how to find stories in public data. As we at Open Knowledge and School of Data believe, data literacy skills are critical if initiatives to release open data are to drive accountability. I ran a session with Nils Mulvad on data cleansing using Open Refine. Other workshops included an introduction to mapping using Google Fusion Tables abd building collaborative databases and using OCCRP’s Investigative Dashboard. There were also two very well attended sessions on digital security run by Bobby Soriano of Tactical Tech and Smari McCarthy that gave participants hands-on experience with free and affordable tools for protecting sensitive data from intrusion. The BBC’s Paul Myers ran a brilliant workshop on web forensics with a rapid fire demonstration of how to use these tools to find hidden information on the web. This included searching using domain registry searches effectively, checking for hidden files on website servers, using the powerful Facebook graph search and analysing image metadata. Many of the learnings from the data track were captured in the tip sheets published on the Uncovering Asia website, go check them out here!

Flattr this!

Global Witness and Open Knowledge – Working together to investigate and campaign against corruption related to the extractives industries

- November 17, 2014 in Uncategorized

Sam Leon, one of Open Knowledge’s data experts, talks about his experiences working as an School of Data Embedded Fellow at Global Witness.

Global Witness are a Nobel Peace Prize nominated not-for-profit organisation devoted to investigating and campaigning against corruption related to the extractives industries. Earlier this year they received the TED Prize and were awarded $1 million to help fight corporate secrecy and on the back of which they launched their End Anonymous Companies campaign.

In February 2014 I began a six month ‘Embedded Fellowship’ at Global Witness, one of the world’s leading anti-corruption NGOs. Global Witness are no strangers to data. They’re been publishing pioneering investigative research for over two decades now, piecing together the complex webs of financial transactions, shell companies and middlemen that so often lie at the heart of corruption in the extractives industries.

Like many campaigning organisations, Global Witness are seeking new and compelling ways to visualise their research, as well as use more effectively the large amounts of public data that have become available in the last few years.

“Sam Leon has unleashed a wave of innovation at Global Witness”

-Gavin Hayman, Executive Director of Global Witness

As part of my work, I’ve delivered data trainings at all levels of the organisation – from senior management to the front line staff. I’ve also been working with a variety of staff to use data collected by Global Witness to create compelling infographics. It’s amazing how powerful these can be to draw attention to stories and thus support Global Witness’s advocacy work.

The first interactive we published on the sharp rise of deaths of environmental defenders demonstrated this. The way we were able to pack some of the core insights of a much more detailed report into a series of images that people could dig into proved a hit on social media and let the story travel further.

GW Info

See here for the full infographic on Global Witness’s website.

But powerful visualisation isn’t just about shareability. It’s also about making a point that would otherwise be hard to grasp without visual aids. Global Witness regularly publish mind-boggling statistics on the scale of corruption in the oil and gas sector.

“The interactive infographics we worked on with Open Knowledge made a big difference to the report’s online impact. The product allowed us to bring out the key themes of the report in a simple, compelling way. This allowed more people to absorb and share the key messages without having to read the full report, but also drew more people into reading it.”
-Oliver Courtney, Senior Campaigner at Global Witness

Take for instance, the $1.1 billion that the Nigerian people were deprived of due to the corruption around the sale of Africa’s largest oil block, OPL 245.

$1.1 billion doesn’t mean much to me, it’s too big of a number. What we sought to do visually was represent the loss to Nigerian citizens in terms of things we could understand like basic health care provision and education.

See here for the full infographic on Shell, ENI and Nigeria’s Missing Millions.

In October 2014, to accompany Global Witness’s campaign against anonymous company ownership, we worked with developers from data journalism startup J++ on The Great Rip Off map.

The aim was to bring together and visualise the vast number of corruption case studies involving shell companies that Global Witness and its partners have unearthed in recent years.

The Great Rip Off!

It was a challenging project that required input from designers, campaigners, developers, journalists and researchers, but we’re proud of what we produced.

Open data principles were followed throughout as Global Witness were committed to creating a resource that its partners could draw on in their advocacy efforts. The underlying data was made available in bulk under a Creative Commons Attribution Sharealike license and open source libraries like Leaflet.js were used. There was also an invite for other parties to submit case studies into the database.

“It’s transformed the way we work, it’s made us think differently how we communicate information: how we make it more accessible, visual and exciting. It’s really changed the way we do things.”
-Brendan O’Donnell, Campaign Leader at Global Witness

For more information on the School of Data Embedded Fellowship Scheme, and to see further details on the work we produced with Global Witness, including interactive infographics, please see the full report here.

Flattr this!

4 Network Visualisation Tools

- August 20, 2014 in Uncategorized

Network visualisation has become an important tool in the armoury of the data wrangler. An increasing volume of research and journalism is using network analysis and visualisation to gain insight into the real world social, political and cultural networks that influence our lives. Take for instance GFK’s analysis of the European political Twittersphere or Gild Lotan’s piece on personalising propoganda in the Israel-Gaza war.

Instagram co-tag graph produced using Gephi, highlighting three distinct topical communities: 1) pro-Israeli (Orange), 2) pro-Palestinian (Yellow), and 3) Muslim (Pink). Source:

Instagram co-tag graph produced using Gephi, highlighting three distinct topical communities: 1) pro-Israeli (Orange), 2) pro-Palestinian (Yellow), and 3) Muslim (Pink). Source:

Below I’ve listed some of the top free tools for sketching and analysing the networks that you produce in the course of your investigations. The first two tools are primarily for those who want to visualise networks based on desk research and where there is a need to include many different types of entity. The latter two, Gephi and Google Fusion Tables, are more tailored for use with larger datasets. Gephi in particular let’s you perform in-depth statistical analysis of networks which can be especially useful for analysing social networks.

##VIS: Visual Investigative Scenarios

A tool for producing simple but stylish network maps using a stock of icons for entities that often come up in investigations e.g. people, companies and cases. It also gives you the option to share and embed your networks online, you can also export it for print. It’s in Beta stage at the moment, so play nice and be sure to report any bugs!

Network of diagram mapping the assets of Azeri Officials in Czech Republic, taken from the VIS public gallery:

Network of diagram mapping the assets of Azeri Officials in Czech Republic, taken from the VIS public gallery:


An online tool that turns lists into network structures so you don’t have to fiddle around with positioning when you add entities into your network. It is limited in terms of design options but it’s simplicity means that you can produce your network sketches pretty quickly.

##Google Fusion Tables

Google Fusion Tables now offers a basic network mapping tool. It has some useful filter functionality and although it lacks the deep customisation options and analysis functionality of Gephi (see below) it can produce insightful visualisations.

OpenOil’s attempt to map BP and its subsidiaries using Google Fusion Tables. More information [here] (

OpenOil’s attempt to map BP and its subsidiaries using Google Fusion Tables. More information [here] (


A desktop tool for performing powerful network analysis and creating slick network visualisations. For those interested in experimenting with Gephi, I would recommend that you try and visualise your own Facebook network. Find more details on how to do this here: School of Data has also published tutorials for mapping company networks and social network analysis.

GFK and University of Vienna's research on the key influencers of the EU Twittersphere: key influencers in the EU Twittersphere:

GFK and University of Vienna’s research on the key influencers of the EU Twittersphere: key influencers in the EU Twittersphere:

Flattr this!