Discover patterns in hundreds of documents with DocumentCloud

Daniel Villatoro - August 20, 2016 in Fellowship, HowTo

If you’re a journalist (or a researcher), say goodbye to printing all your docs in a file, getting them into a folder, and highlighting those with markers, adding post-its and labels. This heavy burden of reading, finding repeated information and highlighting it can be done for you by DocumentCloud: it allows you to reveal the names of the people, places and institutions mentioned in your documents to line up dates in a timeline, to save your docs on the Cloud in a private way – and with the option to make them public later.

DocumentCloud is an Open Source platform, and journalists and other media professionals have been using it as online archive of digital documents and scanned text. It provides a space to share source documents.

A major feature of DocumentCloud is how well it works with printed files. When you upload a PDF scanned as an image, the platform will read it with Optical Character Recognition (OCR) to recognize the words in the file. This allows investigative journalists to upload documents from original sources and make them publically accessible, and for the documents to be processed much more easily.

Some other features include:

  • Running every document through OpenCalais, a metadata technology from Thomson Reuters that aggregates other contextual information to the uploaded files. It can take the dates from a document and graph them in a timeline or help you find other documents related to your story.

  • Annotating and highlighting important sections of your documents. Each note that you add will have its own unique URL so that you can have all in order.

  • Uploading files in a safe and private manner, but you have also the option to share those documents, make them public, and embed them. The sources and evidence of an investigation don’t have to stay in the computer of a journalist or the archives of a media organization – they can go public and become open.

  • Review of the documents that other people have uploaded such as files, hearing transcripts, testimony, legislation, reports, declassified documents and correspondence.

The platform in action

A while ago, an investigation on the manipulation of the buying system at the Guatemalan social insurance revealed a network of attorneys, doctors, specialists and associations of patients that forced the purchase of certain medicines for terminal patients. It was led by Oswaldo Hernández from *Plaza Públic*a, and DocumentCloud was at the core of the investigation process.

“I searched for words like ‘Doctor’ or ‘Attorney’ to find out the names of the people involved. That way I was able to put together a database and the relationships between those involved. It’s like having a big text document where you can explore and search everything”, explains Hernández.

When analysing one of the documents about medicines, DocumentCloud shows the names of people and institutions that are repeated in the text in a graphic plot.

image alt text

A screenshot of the graphic analysis that DocumentCloud plots from the uploaded files

Four creative uses of DocumentCloud

Below are some examples of how you can produce different types of content when you mix uploaded information, creativity and the functions of this tool.

The platform VozData, from the Argentinian newspaper La Nación, combines their own code with the technology of DocumentCloud to set up an openly collaborative platform that transforms Senate expense receipts into open and useful information by crowdsourcing it.

image alt text

Due to the fact that their investigation about violence in a prison got published in The New York Times, *The Marshall Projec*t did a follow-up about how the prison officers censored the names of some guards and interns, and also aerial photos of the prison when the newspaper was distributed to prisoners.

image alt text

The I*nternational Consortium of Investigative Journalists *(ICIJ) uses DocumentCloud so that readers can access the original documents of the Luxembourg Leak, secret agreements that reduced taxes to 350 companies across the world and approved by the Luxembourg authorities.

image alt text

The* Washington Post *used the software to explain the set of instructions that the US National Security Agencys gives to their analysts, so that whenever they fill a form to access databases and justify their research, they don’t reveal too much suspicious or illegal information.

image alt text

So, next time, when you have to do tons of research using original documents, you can make it publicly available through DocumentCloud. And, even if you’re not a journalist, you can still use this tool to browse their extensive catalogue of documents uploaded by journalists across the world.

Flattr this!

Call for a week-long data journalism training in Berlin

Nika Aleksejeva - August 18, 2016 in Events, Fellowship

image alt text

Photo from a data visualization training in Istanbul, 2014. Author: Nika Aleksejeva

‘Data-driven journalism against prejudices about migration’ training course for young media-makers, human rights activists and developers Berlin, 12 – 20 November 2016

Deadline for receiving applications is: 31st August 2016, 23:59h CET.


School of Data fellow, Nika Aleksejeva, in collaboration with European Youth Press (EYP), an umbrella association of young media-makers in Europe, is inviting young media-makers, designers/developers/programmers and human rights activists to participate in a week-long data journalism training. The training aims to produce impartial, data-driven reports on local migration issues using innovative storytelling forms. It will address the current European refugee crisis, from the perspective of 11 European countries (listed below).

What to expect?

The main objective of the training course is to increase data journalism skills through hands-on training and through working on a real story that will eventually be published in the media. During the project, EYP will partner up with established media organisations from the eleven, listed countries, who will each send one journalist to attend the training. Working together, participants will learn data journalism skills and immediately apply them to practical scenarios. The finished results of their work will be published by media partners of the project. It is hoped that this broad public outreach will lead to significant effect on the media’s treatment of the issue. This course will be an opportunity to strengthen an already-established international network of young media-makers, mid-career journalists and activists concerned with migration and refugee rights.

Participants of the training course will:

  • learn and practice data journalism techniques: finding the right data, scraping, compiling, cleaning, storytelling with data;

  • form teams and work on specific projects, with a view to publication in the national media of participants’ home countries;

  • make professional contacts in the field and obtain hands-on experience of working on a cross-border, data-driven investigation.

Financial Information

This training course is funded by the Erasmus+ grant. Participants will receive reimbursement of their travel costs** up to the amount indicated below, **according to their country of residence:

  • Armenia: 270 EUR

  • Belgium: 170 EUR

  • Czech Republic: 80 EUR

  • Denmark: 80 EUR

  • Germany (outside Berlin): 80 EUR

  • Italy: 170 EUR

  • Latvia: 170 EUR

  • Montenegro: 170 EUR

  • Slovakia: 170 EUR

  • Sweden: 170 EUR

  • Ukraine: 170 EUR

  • participants living in Berlin will not be eligible for reimbursement of any travel expenses.

Although travel costs will be reimbursed, participants are asked to make the travel bookings themselves, as soon as possible after being selected. Participants are also asked to take the most economical route from their place of residence to Berlin and use the following means of the transportation:

  • Train: 2nd class ticket (normal as well as high-speed trains),

  • Flight: economy-class air ticket or cheaper,

  • Bus

Accommodation, meals and all necessary materials will be provided.

Who can apply?

Applicants must fulfil all the criteria below:

  • young media-makers, journalism students, bloggers and citizen journalists with a demonstrated interest in issues related to the rights of ethnic minorities, migrants and refugees; human rights activists working on refugee/migration issues; developers interested in the topic;

  • 18-30 year-olds;

  • residents of Czech Republic, Germany, Belgium, Italy, Sweden, Armenia, Ukraine, Montenegro, Slovakia, Denmark and Latvia;

  • proficient in English.

How to apply?

Interested candidates are invited to apply by completing this application form. Please also send your CV, in Europass format, and via e-mail, to applications@youthpress.org with ‘ddj on migration’ in the subject line.

The deadline for receiving completed applications (form and CV) is: 31st August, 23:59h CET.

Flattr this!

In Latvia, a plea for citizens to push for data-driven public policy

Cedric Lombion - August 18, 2016 in Event report, Fellowship

Data is the core substance required for evidence-based policies and decision-making. “How do we make Latvia the country that makes most use of data to inform its decision-making?” was the question that Latvian MP’s and civil-society representatives tried to answer during 1,5 hours on the hot morning of July 2nd, at the occasion of the second edition of the national political festival, LAMPA.

image alt text

This festival, funded by the DOTS foundation, aims to clarify the concept of open data which is still new for Latvian law-makers, who often confuse it with public data. The discussions there serve as a good encouragement to give data to the hands of regular citizens and encourage them to participate in national, evidence-based policy making.

The roadblocks to evidence-based decision-making

None of the participants denied the importance of evidence in decision-making. Nevertheless, many alarming issues were detected. Open data, and engaging civil society in its use, was seen as one of the best short-term solutions for producing more thoughtful policy-making.

First, the State Controller, Elita Krumina, raised the issue that evidence – based on statistics, research documents and research papers – needs to be revised every year. There are many policies based on outdated evidence, even though the real situation has actually changed.

Another issue the Head of State Secretary Office, Martins Krievins, illuminated was that oftentimes decisions are made quickly and there is no time for lengthy research and data-gathering. At the same time, Krumina suggested that a great deal of research is conducted, but the benefit is small: “These papers repeat already-known principles of good governance without giving much data-driven solutions,” she explained.

The problem of trust

“The problem is, we don’t trust many evidence,” says Krievins. He gave an example of the census results: “First, everyone said that the data is incorrect because more people left the country than was counted. Then, when the state conducted an outsourced census, the first question was – whom did the hired company pay in bribes?”

Krievins said that data can be easily manipulated based on policy goals, whereas parliamentarian and experienced politician, Sergejs Dolgopolovs, said that he thinks it’s important to set goals and assess all the risks in order to make better decisions.

Later, Krievins admitted that there are many complex issues with evidence that may encourage a bad decision to be made: “Everyone realises that small schools in the countryside are expensive – the evidence is clear. Nevertheless, schools in the countryside are cultural centres for the local area, hosting many social events. There would be a broad social impact if small schools were to be closed.”

Ernests Jenavs, the founder and CEO of Edurio, an app that helps users to make evidence-based decisions in education, said that evidence should be separated from ideology: “Data should be analysed by independent people, not politically biased decision-makers,” says Jenavs. He suggested opening data, so that politically independent civil society members can suggest evidence-based solutions. Nika Aleksejeva, the Head of School of Data Latvia, agreed with this point, adding that there is a need for enhancing data-literacy in Latvian society and encouraging people to use open data.

Technology allows us to engage with society faster and more cheaply than before, agreed both Janevs and Aleksejeva.

The discussion was concluded by a unanimous message from the panel – there should be much more pressure from civil society for evidence-based decisions in government, and data should be open for everyone to be able to contribute to this decision-making.

Video (in Latvian): link


Infobox
Event name: Festival “Lampa”, discussion “How to make Latvia the greatest country of evidence based policy-making?”
Event type: Roundtable
Event theme: open data and data-driven public policy
Description: Possibilities to execute more evidence based and data-driven policies in Latvian government
Speakers: Ernests Jenavs (the founder and CEO of Edurio) Nika Aleksejeva (the Head of School of Data Latvia) Sergejs Dolgopolovs (parliamentarian), Elita Krumina (the State Controller), Ilze Vinkele (parliamentarian), Martins Krievins (Head of State Secretary Office), Valts Kalnins (The lead researcher at think-tank PROVIDUS)
Partners: NA
Location: Cesis, Latvia
Date: July 2
Audience: cycling society representatives, analysts, others
Number of attendees NA
Gender split: NA
Duration: 1 hour

Flattr this!

Feedback from the 2016 Summer Camp: Precious

Precious Onaimo - August 16, 2016 in Event report, Fellowship

From May 15th to 21st, 40 people from 24 countries gathered at Ibúina in Sao Paulo, Brazil, for the 2016 School of Data Summer Camp. Precious Onaimo, a 2016 School of Data Fellow from Nigeria, shares his thoughts about the event.

image alt text

Aerial view of venue for Summer Camp 2016, Ibiuna, Sao Paulo, Brazil

Amidst Sao Paulo, Brazil’s alleged presidential fiscal irregularities scandal and the ravaging Zika Virus global health concern was a serene gathering of data literacy practitioners. They convened in Brazil at the occasion of the yearly School of Data Summer Camp.

As it is the goal of School of Data to enlist new data Fellows into her global family of data journalists, 10 Fellows from 9 countries and 3 continents were among the enthusiastic audience that gradually trickled into the beautiful and peaceful reserve that would be the venue of the 2016 Summer Camp with heightened expectations of an educative and refreshing data journalism seminar.

The first School of Data Summer Camp took place in 2014. It is an occasion for School of Data to evaluate the activities of the previous year and develop blueprints for the next year. And of high priority amongst the yearly goals for School of Data is the data literacy training for the newly inducted Fellows. In the mornings, the School of Data Summer Camp 2016 attendees were divided into two tracks:

  1. The Governance Track

  2. The Fellowship Track

The Governance track consisted of representatives of member organisations of the School of Data network, former Fellows, members of the School of Data Steering Committee, Marco Tulio Pires, School of Data Programme Manager and Dirk Slater, the official event facilitator. Participants held several sessions dealing with administrative and oversight duties for the year 2016 and finally elected the Steering Committee who would be saddled with oversight function for the year 2016 / 2017.

The Fellowship track comprised all the new Fellows – Nika, Omar, Malick, Danny, Ximena, Kabu, Raisa, Vadym, Paul and myself, representatives from Fellowship partner organisations (Katarina, Tin and Sergio), senior Fellows and some members of the School of Data coordination team (Cedric, Katelyn and David). To get us equipped for the task of promoting data literacy, and informing public debate and policy through data journalism in our respective countries, the track facilitators organized series of data skill training sessions. Some of the topics developed during these sessions included: “Community Mapping How to”, “Setting Fellowship Roadmaps”, “School of Data’s Data Pipeline”, “Event Planning and Anchoring”.

image alt text

School of Data’s New Data Fellows

During the afternoons, everyone took part in the Data Literacy track which was filled with additional training sessions. These included sessions such as ‘How to sell your Ideas’, ‘Responsible Data’, ‘Impact Assessment’, ‘Offline Data Collection’ and ‘Simple Statistical Analysis’.

These sessions trained me on how to convincingly sell my development ideas or initiatives to relevant stakeholders by concentrating on how the suggested initiative would help them save money, save time or make money, make time. Ability to attach cost saving analysis to discussions or argument makes a far reaching impression on the minds of listeners. Impact assessment, another skill that I learned about in these sessions, helps a project manager evaluate the effect a project would have on the intended community based on the opinions and preferences of the target audience. This is done by a series of iterative developmental feedback assessment from the target community. This approach would ensure that the project properly reflects the needs of the community and ensures its continued relevance and sustainability.

At the end of the 5 days, we had our heads filled with new data skills to be transferred to a diverse audience in our respective countries. We also left the camp with lingering memories of newly formed friendships, bonds and networks that would last a life-time.

image alt text

Bottle time with friends

Saturday May 28, 2016, as part of an educative summit organized by Escola de Dados (School of Data Brazil), facilitators from almost all journalistic realms came for one day to Sao Paulo to share their experiences, skills, knowledge, challenges and failures with a very enthusiastic audience. Though major parts of the programme were conducted in Portuguese, which were consequently not accessible to the Anglophone audience. A few sessions however, were conducted in English including Introduction to R Programming, Advanced Statistical Analysis and Data and Digital Security.

Looking back at the many events of this Summer Camp, I will remember the very educative and informative Fellowship sessions, the “all-eyes-on-you” morning go-arounds anchored by Dirk, the different but surprisingly delicious meals, the chilly cold mornings and the enchanting Escola de Dados summit. So worthy of mention and appreciation is the hard work, careful planning and forethought of Marco, Natalie and Meg (the invisible hand) in putting together this very memorable event. Once again, “Thank you!”

Summer Camp 2016 has come and gone but its values and ideals continue to grow.

Flattr this!

Building an Open Data Ecosystem in Tanzania with trainings and stakeholder engagement

Joachim Mangilima - August 14, 2016 in Community, Event report

Open data is often defined as a product: events, portals, hackathons, and so on. But what does the process of opening data look like? In Tanzania, among many other things, it’s a gradual, iterative process of building capacity in Tanzanian government, civil society and infomediaries to manage, publish and use open data. Of late, the open data scene in Tanzania has been growing from strength to strength.

image alt text

Participants in an open data training session related to the Tanzanian health sector

The following milestones are testimony to this growth:

  • last September, Tanzania hosted the first ever Africa Open Data Conference (AODC).

  • the drafting of the country’s open data policy ,which is in the final stages of government approval before it can be passed as policy.

  • formation of the Code for Tanzania chapter,which, among others, will spearhead establishment of local chapters of the global Hacks/Hackers community, as well as a flagship civic technology ‘CitizenLab’, with a core team of software engineers, data analysts and digital journalists, who will work with local newsrooms and social justice NGOs.

  • the establishment of Tanzania Data Lab (Dlab), serving as an anchor for the Data Collaboratives for Local Impact (DCLI) programme, which aims at enabling data analysis and advocating for its prominent use in Tanzanian governmental decision-making. Since the exciting news broke that Tanzania will be joining the Global Data Partnership, the DLab has also started working with the Tanzania National Bureau of Statistics, and other stakeholders, to support the process of assessing what data is needed to drive progress, as defined in the Global Data Partnership Roadmap and, ultimately, leverage the data revolution to achieve the Sustainable Development Goals.

The Tanzania Open Data Initiative

June and April saw another round of training organised under the Tanzania Open Data Initiative (TODI) umbrella, geared towards Tanzanian government agencies covering three key sectors: Education, Health and Water. These are collaborative sessions, tailored towards civil servants working with data related to these sectors, which have been running for three straight years since 2014. They focus on building skills about data-management, cleaning, visualizing and publishing data, open data principles for navigating the legal and professional challenges of managing open data innovation and communicating results to a wider audience.

Often, these sessions produce as many questions as answers – “How precisely do we define ‘access to water’ in rural areas?” or “What does an ‘average passing rate’ really mean?” – but this is encouraged. Indeed, we’re already noticing that a primary beneficiary of open data initiatives is the government itself. Although conventionally billed as a tool for citizens, open data can also be a powerful mechanism to reduce frictions among the multitude of ministries, departments, and agencies (MDAs) of a government.

One notable difference between these rounds in April and June, and previous ones, was that there were a few selected participants from civil society in attendance. This enriched the quality of discussion which resulted in increased engagement of all participants during the sessions: their presence facilitated sharing of experiences for mutual understanding, thereby collaboration between the government and civil society.

Open Data in a day

June’s week-long sessions culminated in an “open data in a day” event at Buni Hub, which for the very first time had a strong focus on media and technology developers. It was amazing seeing the enthusiasm and the level of interaction of this group and how excited they were to put into action key takeaways from the session.

image alt text

Participants from the media and technology industry at the Open Data in a Day event at Buni Hub.

These activities are testimony of the progress that Tanzania is making in the open data arena and, with similar activities planned for the future, there is good reason to expect the country’s open data ecosystem to experience further growth in strength and quality.


Infobox
Event name: Tanzania Open Data Initiative
Event type: Workshop
Event theme: Open data in practice
Description: Training organized under Tanzania Open Data initiative collaboratively between National Bureau of Statistics and E-Government Agency supported by the World Bank tailored towards civil servants working with data
Trainers: Dave Tarrant ,Emil Kimaryo, Joachim Mangilima, John Paul Barreto
Partners: Open Data Institute (ODI)
Location: Dar es Salaam, Tanzania
Date: 7th – 14th June 2016
Audience: Statisticians, Economists and data managers from ministries and government agencies for the first two sessions and journalists, start ups developers and civil society for the third session
Number of attendees 95 across the three sessions
Gender split: almost 50/50
Duration: 6 days

Flattr this!

Feedback from the 2016 Summer Camp – Kabu

Kabu Muhau - August 12, 2016 in Event report, Fellowship

From May 15th to 21st, 40 people from 24 countries gathered at Ibúina in Sao Paulo, Brazil, for the 2016 School of Data Summer Camp. Kabukabu Muhau, a 2016 School of Data Fellow from Zambia, shares her thoughts about the event.

image alt text

SCODA 2016 Fellows. Left to right; Raisa, Danny, Ximena, Omar, Malick, Paul, Kabu, Vadym, Nika and Precious

Ola!

Yep I know one Portuguese word thanks to the School of Data summer camp held in Ibiuna Brazil! Exciting right? But don’t you dare judge me for learning only one word. There was so much happening I could barely keep up! Plus the food was amazingly delicious; my mouth was always full with it!

image alt text

Omar and I getting more food ☺☺☺

Members of the community were exceptionally welcoming. I’ve never met so many people with such a passion for data! The camp brought together people with different data-literacy backgrounds and it was really awesome to learn about data from different perspectives.

So you may ask, what did you learn? Well, my main purpose in applying for the School of Data Fellowship was to learn new data skills that could be applicable in my home country Zambia, specifically in the health sector. To better explain the skills I learnt in the various data-literacy sessions (wish I could’ve attended more!) I attended, I will use the data pipeline: A data pipeline simply shows the different processes involved in data management. There are six main stages in the data pipeline;

  1. The the data pipeline starts with the DEFINE step, which is the same as problem identification; it is usually the first step in research. Defining a workable research problem, usually involve three steps:
    1. Selecting a topic area

    2. Selecting a general problem

    3. Reducing the general problem to a specific, precise and well delimited problem by listing possible answers to the general problem.

  2. Next comes the FIND step, wherein you have to find where the data you need is available. This involves various techniques, from using Google Search operators to identifying how the data could be collected in your environment.

  3. Then comes the GET step. In this part of the pipeline, you are required to collect the data that relates to the selected/identified problem. Different methods can be used to collect data depending on the selected problem.

  4. Data VERIFICATION is done after collection, to prove whether the data collected is valid or not

  5. CLEANING is done to remove inconsistencies in the data

  6. Data ANALYSIS is a process of evaluating the data and converting it to something more meaningful that could be used for decision making.

  7. PRESENTATION is the final stage in the data pipeline, where the analysed data is displayed by use of maps, graphs or tables or any other means.

image alt text

Various knowledge exchange sessions were held throughout the camp, which allowed me to learn many amazing skills that I will use through my fellowship and beyond:

“How to sell”, by Nika, another School of Data fellow from Latvia, was a sessions about the skills required towas a sessions about the skills required to sell a project idea to potential partners. It mainly focuses on four key points; NEED, FACT, ATTRIBUTE and BENEFIT.

  • Needs: talk about what has to be addressed

  • Fact: give evidence of similar projects you have done in the past and any results yielded

  • Attribute: mention any qualifications that you hold

  • Benefit: simply mention what the person or organisation will gain from partnering with you.

Yuandra Ismiraldi, a 2014 School of Data Fellow from Indonesia, Ismiraldi, a 2014 School of Data Fellow from Indonesia, and Malick Lingani, a 2016 School of Data Fellow from Burkina Faso, Lingani, a 2016 School of Data Fellow from Burkina Faso, conducted a session on how data can be collected using sensors. They explained how sensors can be used to measure the PH of water to determine its quality; and also measure temperature and humidity for weather forecast. The information collected from sensors is temporarily stored in a SIM card via a GPRSbee, a SIM card socket which allows the use of SIM cards as a temporary storage solution. The geocoded information is then sent to a central server at defined time intervals (every minute, hourly, daily).

I also also learnt about about a very important data cleaning tool called OpenRefine, which can be used to highlight inconsistencies in the dataset, then go on to clean them. It is so easy to use and I’ve already started practicing cleaning datasets with it. I think it is much better tool for data cleaning than the Excel I am used to.

Finally, while Finally, while I usually use SPSS for data analysis,, II was introduced to the statistics software the statistics software R at the camp and will soon be exploring how it works! Tableau is a data visualisation is a data visualisation tool I was also also introduced to and I plan to explore it it more. In addition, whenever I find myself in a situation where I need to present my data without using a computer, I will always refer the work of the work of Sylvia Fredriksson, from Ecole des données (School of Data France)Sylvia Fredriksson, from Ecole des données (School of Data France)on how to present data physically.

image alt text

Silvia’s physical data presentation session

During the Summer Camp, Fellows had a specific set of sessions tailored for their need: the Fellowship track. It was run by three amazing coordinators – Camilla, Cedric and David – and itand it introduced us us to the FFellowship programme. Thankfully, they guided us patiently, making the whole programme seem manageable. It was really exciting to meet people from different countries, to learn about their cultures and what they use data for in their everyday lives. I now have a bigger data-literacy family, all thanks to Summer Camp!

Muito obrigado SCHOOL OF DATA for this awesome opportunity!!! I guess I learnt more than one Portuguese word…

image alt text

We had a whole lot of fun!

image alt text

SCODA African team

image alt text

SCODA 2016 African fellows.

image alt text

Flattr this!

Using data for improving cyclist community in Riga, Latvia

Nika Aleksejeva - August 10, 2016 in Event report, Fellowship

Nika Aleksejeva presenting the project

Would you believe that a socially relevant, data-driven project can be accomplished without a budget, a big team and full-time staff? How? This question was the focus of the ‘How we did it?’ meetup in Riga, Latvia. It was the final point of #Velodati – a data-driven project that crowdsourced geographic data about cycling mobility in Riga, initiated and conducted by the School of Data Latvian local group (Datu skola).

The Datu Skola’s mission is to facilitate data-driven projects, conducted by journalists and activists, in collaboration with data analysts and programmers. The #Velodati project works as an example for such projects.

As a result an interactive online map was created showing the most busy cycling routes and how they overlap with the net of cycle tracks in Riga.

screenshot of the project

The project took nothing more than three months of one person’s work and 37 euros for posters, that encouraged Riga cyclists to share GPS recordings of their routes during the Riga Cycling Week in May. This was possible thanks to the open source and freemium tools used to create the crowdsourcing campaign, to clean and to visualize data. As a result, the online map got over 16.2K+ views (in a country of 1.9M population) and received coverage in eight national media outlets.

Here is a list of tools used for every part of the data project:

Crowdsourcing campaign Data collection Data cleaning Data visualization, publication
Froont campaign’s web page
Animaker video animation
Typeform survey sharing instructions
Google Docs data recording instructions
Zapier email automation
Gmail data compilation
“Save emails and attachments” Google Spreadsheet add-on organising data
QGIS data cleaning, formating
CartoDB map visualization
Tableau Public survey data visualization
Social media (Twitter, Facebook) social media campaign promotion of results

Each tool was demonstrated during the first part of the event. Attendees were particularly interested in Animaker, the video editing tool, Zapier, the cross-platform integration tool, the “Save emails and attachments” Google add-on, that organises email attachments automatically on Google Drive and CartoDB, a geographic data visualization tool.

Attendees also wanted to know why data vas visualized using points instead of lines and how a person who cleaned data made choices regarding which routes to keep or delete. Some also started to wonder how to improve the data crowdsourcing campaign for greater data submissions.

This was a great warm-up for the second part of the event. Participants split into three working groups to brainstorm about next steps for the project.

  • One group discussed how the project could be improved for more impactful, data-driven results.

  • Another group discussed how to lobby Riga municipality for better cycling infrastructure in the capital.

  • Finally, there was a group which brainstormed ideas for other data journalism projects.

All groups concluded that it’s useful to combine cycling data with data about public transportation. Bicycles can serve as a good alternative, not only for cars, but also for reaching areas of the city where public transportation is inconvenient. Research, such as that conducted as part of this project, could be used to make evidence-based decisions regarding improving citizen mobility in Riga.

The tools and methods used to produce the #Velodati story will be shared as learning modules on School of Data international page.

audience

Infobox
Event Name Velodati – How we did it?
Type meetup
Description a reflection on methodology and tools used to produce the “Velodati” story.
Trainers Nika Aleksejeva
Partners No
Location Riga, Latvia
Date July 5th
Audience journalists, cycling community representatives, analysts, civic society representatives, others
Number of attendees 23
Gender split NA
Duration 3 hours

Flattr this!

Reflections from the field #1: It’s not enough to do great work. Talk about it

Daniel Villatoro - August 8, 2016 in Event report, Fellowship

A lot of projects using data are making a great impact. We just don’t notice them because people don’t tend to advocate about their work

image alt text

During #CodaBR, the first ever Brasilian Conference of Data Journalism, the last session of the event was a showcase of the groundbreaking work of Latin American journalists. So often, we reference the work of more-developed countries, whose work achieves greater prominence thanks to their rich resources. Countries with a bigger rate of internet users and easy access to the latest technologies have a technical advantage and tend to be more able to produce cutting edge work.

Often, those working in smaller, less-developed countries tend to envy the work that happens in more-developed contexts, but Latin America has proven that the work produced within the continent is not just of good quality, but also tackles social justice issues from within a local context, thereby making the work of greater merit.

In a series of Lighting Talks, three local projects and two from other countries (Peru and Guatemala) showed the impressive range of impacts using data in stories has had in their work. A lot of people in the audience commented about how easily data-literacy work in Latin America can be overlooked, due to the overflow of information from more-developed countries and the lack of communication channels for journalist to showcase and advocate about their work.

Here are few great examples that were presented during the conference:


Ojo Publico and political finance tracking

OjoPublico, a team from Peru, have tracked the corruption that affects the political campaign funding system in their country. The work is comprehensive, with visualizations, tools, and practical explanations of the ways in which these money transfers happen. Antonio Cucho, developer and founder, took us on a tour of the flow of money behind the Peruvian political campaigning system.

http://fondosdepapel.ojo-publico.com/

image alt textimage alt text


Estado o dados and the failed Brazilian education projects

From Brazil, Daniel Bramatti talked of how he uncovered the way the government gave billions of reais to private Brazilian universities in order to increase the number of graduates in the country, but failed. The work is all explained in these seven graphs:

http://blog.estadaodados.com/fies/

image alt text

The Huffington Post and LGBT-phobia in Brazil

With highly developed storytelling, Daniel Flor showed many cases of LGBT-phobia in Brazil and developed a crowdsourced map for people to report cases. The lesson learned? When there’s no data to work with, build a way to obtain it. Collecting is a vital part of the process of working with data.

http://projects.brasilpost.com.br/lgbtfobia/

http://mtrpires.github.io/caj2016-huff/

image alt textimage alt text


TV Globo and the murders in São Paulo

Thought that data journalism was only fit for the web or print? Think again. Luísa Brito showed how one of the mainstream Brazilian TV stations has used data in their video stories. After analysing police records in São Paulo, obtained through freedom of information act requests, TV Globo found that one in every four people murdered in the city was killed by the police.

http://g1.globo.com/sao-paulo/noticia/2016/04/uma-em-cada-4-pessoas-assassinadas-em-sp-foi-morta-pela-policia.html

image alt textimage alt text


Plaza Publica and malnutrition in Guatemala

I had the opportunity to talk about the way the Guatemalan government hid the deaths of kids who died, due to malnutrition, by analysing a database of death records of the country.

image alt textimage alt text

image alt text

To me, it was an important lesson to learn. Data literacy practitioners who work in more difficult contexts, with less access to the latest technology and with more challenges in obtaining data that supports stories, can still produce relevant, impactful work.


Infobox
Event name: 1st Brazilian Conference of Data Journalism
Event type: Conference
Event theme: Data Journalism
Description: Meeting point to discuss the landscape for the production of data related products in journalism, learning basic techniques about data-driven approach to social change and use of information
Trainers: Yasodara Córdova, Vitor George, Vadym Hudyma, Natália Mazotte, Marina Atoji, Marco Túlio Pires, Juan Manuel Casanueva, Joana Varon, Humberto Ferreira, Fabiano Angélico, Dirk Slater, David Opoku, Daniel Bramatti
Partners: Nic.br, SocialTIC and Escola do Dados
Location: NIC.br Sao Paulo, Brazil
Audience: Journalists, Data Scientist, Communication Officers, Students, Activist, Developers, Designers
Gender split: F 40/ M 60
Duration: 1h
Website: http://coda.escoladedados.org

Flattr this!

Feedback from the 2016 Summer Camp: Malick

Malick Lingani - August 6, 2016 in Event report, Fellowship

From May 15th to 21st, 40 people from 24 countries gathered at Ibúina in Sao Paulo, Brazil, for the 2016 School of Data Summer Camp. Malick Lingani, a 2016 School of Data Fellow from Burkina Faso, shares his thoughts about the event.

image alt text

Photo: Hill climbing by the Campers, a good way to start the day

Summer Camp is the beginning of the eight-month School of Data Fellowship, where new members of the family learn from elders and work to set the goals that will govern their work in the coming months. In a private residence at Ibúina, Sao Paulo, the new data explorers found an ideal environment in which to stimulate their brains. The location was relatively quiet and entirely green, in the middle of Amazonia and next to a lake where Escola de Dados (School of Data in Portuguese) welcomed us. Despite the limited internet connection, I will say the continuous flow of great Brazilian cuisine, prepared by the most humble chefs, definitely made the camp a success. Before getting into the nitty-gritty, it is appropriate to present School of Data.

School of Data: what is it?

School of Data is a project of Open Knowledge International, launched in May 2012, which aims to empower civil society by teaching the skills necessary to use open data. The project was birthed on the fact that civil society (citizens, NGOs, journalists, associations, etc.) could greatly benefit from the power of open data but lack the skills needed to understand, analyze and utilise it effectively.

What happens at Summer Camp?

A typical day at Summer Camp began at 9:00 AM, after breakfast, with a gathering next to the lake called “opening circle”, a session designed to provide an overview of the day’s objectives. Dirk Slater, Summer Camp facilitator, lead this session in a pretty relaxed way, just to put everyone at ease and encourage the participation of all:

“What is the first thing you will do once you are back home?” – Dirk; and Kabu replied: “Find my cousin and have fun”. (Laughs…) it was on day 5; just imagine?

After the Opening Circle, participants dispersed to either the “Governance Track” or the “Fellowship Track.” The Governance Track gathered mentors who have the task to work on the governance of the organization, make last year’s review, set new goals and elect the new Steering Committee. I participated in the Fellowship Track in which we worked to set our fellowship objectives.

image alt text

Photo: Paul explaining his objectives to other Fellows and mentors

Overall, each of the Fellows will work to promote the use of data by journalists, CSOs and other interested parties by communicating and organizing training sessions, and also by producing online training modules, tutorials and blog posts. In doing this exercise, we were also assisted by experts in the topic in which we will be promoting the use of data. To recap, the topics for this year are: extractives, health, ethical uses of data, data journalism and gender issues.

In a second step, we worked to determine the tasks and scheduled them in order to achieve our objectives. Here, David reminded us that we have just 10 days a month for our Fellowship work and, because of this, we should avoid scheduling important activities successively, in one month.

image alt text

Photo: Cédric Lombion commenting on our programmes

Topic experts also assisted us in this phase. My topic is Extractives and Katarina from Natural Resource Governance Institute (NRGI) helped me identify workshops and conferences in which I could participate as a Trainer or Speaker.

During the afternoon, we shared our experiences of data-literacy activities. Every day, 2 sessions were held during which 5 workshops were ran simultaneously. I learned some socio-economic and environmental analyzes on EITI data of some African countries. Katarina facilitated this session.

image alt text

Photo: Katarina presenting some analyses from Extractives data

Many other sessions have hugely enriched me during the five days of Summer Camp. Among others, there were:

  • ‘How to sell ideas?’ by Nika Aleksejeva,

  • ‘Introduction to Tableau’, by Daniel Villatoro,

  • ‘What visualization for what purpose?’, by Dirk Slater and

  • ’How to mobilize data journalists in Turkey’ by Pinar Dag.

(The Data Literacy Pipeline I found on a wall, by accident).

image alt text

Photo: Data Literacy Pipeline

Around 7:00 p.m., came the “Closing circle”, a moment to make a brief review of the day. Dirk asked everyone to give their most remarkable experience of the day, including the difficulties they may have encountered. The ‘Closing Circle’, is also the moment when surprise announcements are made. I remember particularly when Marco announced on May 20th that we will have a party that night. Samba, Tango and Salsa were the highlights of the dancefloor. Yes, that’s also part of the SCODA Camp (School of Data Camp). Another flagship announcement was the program of the Conference of Data Journalism of Brazil, to be held on May 21st.


The #CODA-BR: The Conference of Data Journalism of Brazil

On the evening of May 20th, we left the camp, the green paradise. Three hour drive to reach the metropolitan city, Sao Paulo. We alight with our suitcases in the center of this city, which is the symbol of the emergence of Brazil. Our friends from Escola de Dados carefully arranged everything. On May 21st from 9:00 a.m, we were gratified with a rich and memorable conference. Experiences of data journalism in Brazil and Latin America were shared in plenary sessions.

image alt text

Photo: Juan presenting data journalism experiences from Latin America.

Afterwards, various workshops took place, focused on data processing tools and techniques, data analysis and visualization. Analysis of data with Google’s tools, encryption and digital security for journalists, data visualization with d3.js, introduction to R and Python for journalists and OpenRefine for fast data processing were among other sessions not to miss.

image alt text

Photo: Data analysis for journalists with Google tools by Marco.

I left the Summer Camp filled with confidence that our contribution, as modest as it may be, to a more “Data-Literate” world will spread like wildfire. I must give back this knowledge to my community as prescribed. So, it is with pleasure that I will work with journalists, CSOs, students and researchers to advance Data Literacy back home.

Flattr this!

Data Journalism in Turkey: still a new topic

Pınar Dag - July 27, 2016 in Event report, Research

School of Data now counts among its ranks a local group in Turkey led by Pınar Dağ, an experience Data Journalist and Journalism professor based in Istanbul. As part of their activities, they have been running numerous datajournalism trainings, attracting an important proportion of non-journalists, eager to learn about data. The article below presents a data-driven overview of these workshops.


According to the participation research of data journalism workshops carried out with all 110 participants hosted by Pınar Dağ and Sadettin Demirel, 36.4% of all participants stated that they have studied data literacy courses before the workshops, while the remaining 70 people make up the 63% that have never before studied data literacy or data analysis.

 

If we analyse the data obtained in the context of gender and age, there are 21 men and 17 women that expressed they have training experience regarding data analysis. However, the interesting figure comes from the younger generation. 65% of participants are between 18 and 25 ages, and they have no experience or previous training in data literacy.

 

These numbers indicated that even though the participants met with data analysis and data literacy by way of data journalism workshops, the participants from the 18-25 age range have a serious lack of data literacy.

 

 

The workshops contribute to spreading data journalism terminologies

 

 

Data journalism has its own terminology and vocabulary, so in order to evaluate how participants learn the main names such as data journalism, open data, open government, data portal, data visualization, we asked them whether they ascertained these terminologies thought the workshops, or from elsewhere.

 

 

More than half of the participants pointed out that they had known previously the data journalism and open data terms. On the other hand, 36 people expressed that they understand data journalism thanks to the workshops, where as 41 participants said the same for open data.

 

Also, a number of participants stated they know open government, data portal and data visualization terminologies by way of data journalism training. This was more than other participants that indicated they already knew. If we describe the issue with numbers, 70 of 110 participants learned about data portals with the workshops, while 53 of them for open government, and 51 of them for data visualisation stated that they have known these terms through the help of workshops.

 

The good news is that the number of participants in the 18 – 25 age range that learn data journalism terms thanks to the workshops, is more than twice of those who knew the terminologies previously. As a result, these statistics underline that workshops facilitate the understanding of terminology.

 

90 percent of participants like the data journalism workshops

 

99 people (90 %) of participants expressed that they liked the data journalism training. Seven people stayed indecisive and four of them said they didn’t like it. The participants appreciated not only workshops but also the instructors and contents of training and the other guests.

 

As the data indicated, 87.1 % of people were very pleased by the content of the workshop while 102 participants said they liked the guests. 94 of them said they appreciated the data journalism instructors. Most of participants had a positive attitude on content, instructors, and guests of workshops. There were also participants that remained indecisive, or did not appreciate some features of the data journalism training.

 

 

 

52.8 % of participants did not like the duration of the workshops

 

 

While 27.2% of participants, a total of 30 participants, expressed that they liked the duration and time of the workshops, 28 people, 25.4%, stayed indecisive. So 52.8% of participants, the remaining 58 people, were not pleased with the length of the workshops.

 

On the other hand, there was a negative perception about the infrastructure and internet network among the participants. 54.1 percentage of participants pointed out that they didn’t appreciate the internet network that was provided for data journalism activities.

 

Moreover, most of the participants were fine with accommodation (65.4%), transportation (74.1%) and catering services (66.3%) that were supplied during the workshops.

 

More than half of the participants heard about workshops via social media

 

The question was ‘how did you find out about workshops and where do you get workshop news and announcements?’ The participants stated that they find out and get in touch with workshops predominantly via social media (54%), instructors at universities (26.3%), e-mail (30%), website (35.4%), and friends (17.7%)

 

    It seems especially digital communication channels which are social media and e- mail played an important role to get in touch with participants and get them informed.  

53 percent of Participants stated: I can make data visualizations

 

 

More than half of the participants said that they can create data visualizations thanks to the workshops while 51% of them, a total of 57 participants, expressed that workshops informed and facilitated them to get involved with various kinds of data sources. Also, 41% of participants stated they have developed their data analysis skills and 40% of them underlined they can work with data thanks to the help of the workshops.

 

 

 

All we need is longer workshops

 

The last two question of the research are about participants suggest what to improve data journalism workshops and increase and spread data literacy.

 

87% of the participants, a total of 86 people, suggested that they need long-dated workshops that are based on generation of data journalism projects. This is the most supported advice among the other options. Other options were cooperation with journalism association, inviting international data journalists for workshops and arranging MOOC programs to increase efficiency and get in touch with more people.

 

 

    The university curriculums need data literacy courses  

In order to increase data literacy, 74.5% of participants indicated that the university curriculum needs more data literacy and data journalism courses. The participants suggest that these courses could add to the current education plans. Also 60.9% of them think the key is open data. If government increases sharing more sources of open data, that could improve data literacy in Turkey.

 

 

There are other suggestions too. For example, cooperation with journalists, NGOs and developers, to fund support of the government to create a data savvy generation.

 

 

Last but not least, we asked all the participants, ‘If you had the chance, would you want to attend more workshops?’ 103 of 110 participants said yes, they would.

 

Methodology

Quantitative method is used along with survey data gathering techniques for this research. Participants are reached via e-mail and Google form is used as a tool of the questionnaire. The population of this research is the participants of the last 10 data journalism workshops. Because of the fact that a number of participants have changed between 10 and 20 for per workshop, the exact number of participants has taken 15 for per workshops. So the research universe is 150 people. The sample size of research is calculated with 95% confidence level and a 5 % margin of error. The sample size is accepted as 109 participants.

 

Tableau, Infogr.am and Google Charts used for data visualisations

Research datasets: https://drive.google.com/open?id=0Bxz1Zy_R9wbONEMxWWJucHAwVlE

Research questionnaire : https://drive.google.com/open?id=0Bxz1Zy_R9wbOek14Y09QQUk2NjA

 

Flattr this!