You are browsing the archive for Daniel Villatoro.

Discover patterns in hundreds of documents with DocumentCloud

- August 20, 2016 in Fellowship, HowTo

If you’re a journalist (or a researcher), say goodbye to printing all your docs in a file, getting them into a folder, and highlighting those with markers, adding post-its and labels. This heavy burden of reading, finding repeated information and highlighting it can be done for you by DocumentCloud: it allows you to reveal the names of the people, places and institutions mentioned in your documents to line up dates in a timeline, to save your docs on the Cloud in a private way – and with the option to make them public later.

DocumentCloud is an Open Source platform, and journalists and other media professionals have been using it as online archive of digital documents and scanned text. It provides a space to share source documents.

A major feature of DocumentCloud is how well it works with printed files. When you upload a PDF scanned as an image, the platform will read it with Optical Character Recognition (OCR) to recognize the words in the file. This allows investigative journalists to upload documents from original sources and make them publically accessible, and for the documents to be processed much more easily.

Some other features include:

  • Running every document through OpenCalais, a metadata technology from Thomson Reuters that aggregates other contextual information to the uploaded files. It can take the dates from a document and graph them in a timeline or help you find other documents related to your story.

  • Annotating and highlighting important sections of your documents. Each note that you add will have its own unique URL so that you can have all in order.

  • Uploading files in a safe and private manner, but you have also the option to share those documents, make them public, and embed them. The sources and evidence of an investigation don’t have to stay in the computer of a journalist or the archives of a media organization – they can go public and become open.

  • Review of the documents that other people have uploaded such as files, hearing transcripts, testimony, legislation, reports, declassified documents and correspondence.

The platform in action

A while ago, an investigation on the manipulation of the buying system at the Guatemalan social insurance revealed a network of attorneys, doctors, specialists and associations of patients that forced the purchase of certain medicines for terminal patients. It was led by Oswaldo Hernández from *Plaza Públic*a, and DocumentCloud was at the core of the investigation process.

“I searched for words like ‘Doctor’ or ‘Attorney’ to find out the names of the people involved. That way I was able to put together a database and the relationships between those involved. It’s like having a big text document where you can explore and search everything”, explains Hernández.

When analysing one of the documents about medicines, DocumentCloud shows the names of people and institutions that are repeated in the text in a graphic plot.

image alt text

A screenshot of the graphic analysis that DocumentCloud plots from the uploaded files

Four creative uses of DocumentCloud

Below are some examples of how you can produce different types of content when you mix uploaded information, creativity and the functions of this tool.

The platform VozData, from the Argentinian newspaper La Nación, combines their own code with the technology of DocumentCloud to set up an openly collaborative platform that transforms Senate expense receipts into open and useful information by crowdsourcing it.

image alt text

Due to the fact that their investigation about violence in a prison got published in The New York Times, *The Marshall Projec*t did a follow-up about how the prison officers censored the names of some guards and interns, and also aerial photos of the prison when the newspaper was distributed to prisoners.

image alt text

The I*nternational Consortium of Investigative Journalists *(ICIJ) uses DocumentCloud so that readers can access the original documents of the Luxembourg Leak, secret agreements that reduced taxes to 350 companies across the world and approved by the Luxembourg authorities.

image alt text

The* Washington Post *used the software to explain the set of instructions that the US National Security Agencys gives to their analysts, so that whenever they fill a form to access databases and justify their research, they don’t reveal too much suspicious or illegal information.

image alt text

So, next time, when you have to do tons of research using original documents, you can make it publicly available through DocumentCloud. And, even if you’re not a journalist, you can still use this tool to browse their extensive catalogue of documents uploaded by journalists across the world.

Flattr this!

Reflections from the field #1: It’s not enough to do great work. Talk about it

- August 8, 2016 in Event report, Fellowship

A lot of projects using data are making a great impact. We just don’t notice them because people don’t tend to advocate about their work

image alt text

During #CodaBR, the first ever Brasilian Conference of Data Journalism, the last session of the event was a showcase of the groundbreaking work of Latin American journalists. So often, we reference the work of more-developed countries, whose work achieves greater prominence thanks to their rich resources. Countries with a bigger rate of internet users and easy access to the latest technologies have a technical advantage and tend to be more able to produce cutting edge work.

Often, those working in smaller, less-developed countries tend to envy the work that happens in more-developed contexts, but Latin America has proven that the work produced within the continent is not just of good quality, but also tackles social justice issues from within a local context, thereby making the work of greater merit.

In a series of Lighting Talks, three local projects and two from other countries (Peru and Guatemala) showed the impressive range of impacts using data in stories has had in their work. A lot of people in the audience commented about how easily data-literacy work in Latin America can be overlooked, due to the overflow of information from more-developed countries and the lack of communication channels for journalist to showcase and advocate about their work.

Here are few great examples that were presented during the conference:

Ojo Publico and political finance tracking

OjoPublico, a team from Peru, have tracked the corruption that affects the political campaign funding system in their country. The work is comprehensive, with visualizations, tools, and practical explanations of the ways in which these money transfers happen. Antonio Cucho, developer and founder, took us on a tour of the flow of money behind the Peruvian political campaigning system.

image alt textimage alt text

Estado o dados and the failed Brazilian education projects

From Brazil, Daniel Bramatti talked of how he uncovered the way the government gave billions of reais to private Brazilian universities in order to increase the number of graduates in the country, but failed. The work is all explained in these seven graphs:

image alt text

The Huffington Post and LGBT-phobia in Brazil

With highly developed storytelling, Daniel Flor showed many cases of LGBT-phobia in Brazil and developed a crowdsourced map for people to report cases. The lesson learned? When there’s no data to work with, build a way to obtain it. Collecting is a vital part of the process of working with data.

image alt textimage alt text

TV Globo and the murders in São Paulo

Thought that data journalism was only fit for the web or print? Think again. Luísa Brito showed how one of the mainstream Brazilian TV stations has used data in their video stories. After analysing police records in São Paulo, obtained through freedom of information act requests, TV Globo found that one in every four people murdered in the city was killed by the police.

image alt textimage alt text

Plaza Publica and malnutrition in Guatemala

I had the opportunity to talk about the way the Guatemalan government hid the deaths of kids who died, due to malnutrition, by analysing a database of death records of the country.

image alt textimage alt text

image alt text

To me, it was an important lesson to learn. Data literacy practitioners who work in more difficult contexts, with less access to the latest technology and with more challenges in obtaining data that supports stories, can still produce relevant, impactful work.

Event name: 1st Brazilian Conference of Data Journalism
Event type: Conference
Event theme: Data Journalism
Description: Meeting point to discuss the landscape for the production of data related products in journalism, learning basic techniques about data-driven approach to social change and use of information
Trainers: Yasodara Córdova, Vitor George, Vadym Hudyma, Natália Mazotte, Marina Atoji, Marco Túlio Pires, Juan Manuel Casanueva, Joana Varon, Humberto Ferreira, Fabiano Angélico, Dirk Slater, David Opoku, Daniel Bramatti
Partners:, SocialTIC and Escola do Dados
Location: Sao Paulo, Brazil
Audience: Journalists, Data Scientist, Communication Officers, Students, Activist, Developers, Designers
Gender split: F 40/ M 60
Duration: 1h

Flattr this!