Mapping | School of Data - Evidence is Power

You are browsing the archive for Mapping.

The Data Journalism Bootcamp at AUB Lebanon

Ali Rebaie - January 29, 2015 in Events, Fellowship

Data love is spreading like never before. Unlike previous workshops we did in the MENA region, on the 18^th of January 2015, we gave an intensive data journalism workshop at the American University of Beirut for four consecutive days in collaboration with Dr. Jad Melki, Director of media studies program at AUB. The Data team at Data Aurora were really happy sharing this experience with students from different academic backgrounds, including media studies, engineering or business.

The workshop was mainly led by Ali Rebaie, a Senior School of Data fellow, and Bahia Halawi, a data scientist at Data Aurora, along with the data community team assistants; Zayna Ayyad, Noor Latif and Hsein Kassab. The aim of the workshop was to give the students an introduction to the world of open data and data journalism, in particular, through tutorials on open source tools and methods used in this field. Moreover, we wanted to put students on track regarding the use of data.

On the first day, the students were introduced to data journalism, from a theoretical approach, in particular, the data pipeline which outlined the different phases in any data visualization project: find, get, verify, clean, analyze and present. After that, students were being technically involved in scraping and cleaning data using tools such as open refine and Tabula.

Day two was all about mapping, from mapping best practices to mapping formats and shapes. Students were first exposed to different types of maps and design styles that served the purpose of each map. Moreover, best mappings techniques and visualizations were emphasized to explain their relative serving purpose. Eventually, participants became able to differentiate between the dot maps and the choropleth maps as well as many others. Then they used twitter data that contained geolocations to contrast varying tweeting zones by placing these tweets at their origins on cartodb. Similarly, they created other maps using QGIS and Tilemill. The mapping exercises were really fun and students were very happy to create their own maps without a single line of code.

On the third day, Bahia gave a lecture on network analysis, some important mathematical notions needed for working with graphs as well as possible uses and case studies related to this field. Meanwhile, Ali was unveiling different open data portals to provide the students with more resources and data sets. After these topics were emphasized, a technical demonstration on the use of network analysis tool to analyze two topics was performed. Students were analyzing climate change and later, the AUB media group on Facebook was also analyzed and we had its graph drawn. It was very cool to find out that one of the top influencers in that network was among the students taking the training. Students were also taught to do the same analysis for their own friends’ lists. Facebook data was being collected and the visualizations were being drawn in a network visualization tool.

After completing the interactive types of visualizations, the fourth day was about static ones, mainly, infographics. Each student had the chance to extract the information needed for an interesting topic to transform it into a visual piece. Bahia was working around with students, teaching them how to refine the data so that it becomes simple and short, thus usable for building the infographic design. Later, Yousif, a senior creative designer at Data Aurora, trained the students on the use of Photoshop and illustrator, two of the tools commonly used by infographic designers. At the end of the session, each student submitted a well done infographic of which some are posted below.

After the workshop Zayna had small talks with the students to get their feedback and here she quoted some of their opinions:

“It should be a full course, the performance and content was good but at some point, some data journalism tools need to be more mature and user-friendly to reduce the time needed to create a story,” said Jad Melki, Director of media studies program at AUB, “it was great overall.”

“It’s really good but the technical parts need a lot of time. We learned about new apps. Mapping, definitely I will try to learn more about it,” said Carla Sertin, a media student.

“It was great we got introduced to new stuff. Mapping, I loved it and found it very useful for me,” said Ellen Francis, civil engineering student. “The workshop was a motivation for me to work more on this,” she added, “it would work as a one semester long course.”

Azza El Masri, a media student, is interested in doing MA in data journalism. “I like it I expected it to be a bit harder, I would prefer more advanced stuff in scraping,” she added.

Tags: Data journalism, Gephi, Mapping Comments Off on The Data Journalism Bootcamp at AUB Lebanon

Instigating the Rise of Demand for Data: The #OpenData Party in Abuja

olubabayemi - December 8, 2014 in Events

So what happens when you have 102 Nigerians representing all the six regions of the country in Abuja to teach and learn about what they can use data or open data for? “It was an action – packed, idea generating, brain storming, mind grooming which will help me in my advocacy as well as in tracking how the budget of my country is being spent, a challenging and yet fun – filled event” as described by Clinton Ezeigwe of People to People International; “As someone working in a non-government organization, this event has boost my knowledge on data sourcing, data collection, data analysis, and will help me in mapping my work environment” informed Aniekan Archibong of Partners for Peace in Akwa Ibom state.

What participants said about the 2 – day event

In a 2 – day event on Friday, November 28 and Saturday 29, 2014 at the African University of Science and Technology, that was meant to raise the awareness on how NGOs can use available data to monitor service delivery in the health sector; empower journalist on using data for creating compelling stories that can cause change; and in all create a platform (on-the training) that can be used to monitor service delivery in the health sector. “We will be most interested in how citizens turned professionals like you all here, can take up stories from the data that will be curated during this event, in asking government questions about inputs in the health sector, and other sectors as well” said Christine K, Country Director of Heinrich Boell Stiftung Nigeria, during her keynote at the event.

In the minds of many participants was how we fit into this new world of Open Data with a party at the end. Did you ever wonder why the party? Well to clear the air, we started the “party” helping participants to know what data will mean to us, they as participants, and what it can change in the life of that curious woman that walks 30km from Keta to Goronyo to join an antenatal care program; what it meant for that hardworking man to transit from Potiskum to Kaduna before he can get a Hepatitis C viral load test, even though he had to borrow the 23, 000 Naira meant for this test. Yes, available and structured data can create a great story out of this recurring event.” If you are still looking for what could then happen from the gathering of these 102 participants – it’s all written in gold here, even though these are still stories in the making, but we can do much more” exclaimed Anas Sani Anka of the Nigeria Television Authority in Gusau, Zamfara

Adam Talsma of Reboot sharing skills that can make data matter to people on ground

Going through the data pipeline (data sourcing, collection, collation, analysis, reporting and use) surprisingly, we got this shock again! Only 2% of the participants knew where to quickly find the available data of the federal government budget in Nigeria. Whilst data pipelines was meant to guide participants through the data management processes (in a participatory manner) it was another opportunity to share where the available data are online in the country, and how they can be used in advocacy and storytelling to start conversation around transparency and accountability; and also in exchanging feedbacks between the people and government.

Leading the skill share session was Adam Talsma of Reboot taking participants through using formhub and textit and Michael Egbe of eHealth Africa introducing participants to how they are mapping Nigeria using Open Street Maps. The storytelling sessions had Tina Armstrong, an award winning data journalist that is interested in telling stories of vulnerable communities using data; Joshua Olufemi shared skills and tools that has made Premium Times the best online investigative media in the country; while the session was rounded up by Ledum of Right to Know, showing participants how to enact the Freedom of Information Act in getting data from the government.

Joshua Olufemi of Premium Times Nigeria sharing skills on telling stories with data

The high point of the first day was the, I want to learn, and I want to teach session – a remix of the School of Data Summer Camp World Cafe and Skill Share Session. “Learning particular skills in 10 minutes can be mind blowing and something I will not want to forget in a long time, I only hope we could have had more time other than the 30 minutes for the 10 min/skill session” narrated Michael Saanu of Africa Hope Foundation. Amongst skills that were taught is using Microsoft Excel for analysis, creating Survey form using Google Form, collaboration techniques with the Google Drive, writing funding proposals, community building, using Twitter and Facebook for advocacy, data scraping using Tabula amongst others. After this session, it was clear that participants wanted to be part of all the sessions, but they were only limited to three, as the night crept in faster than we expected – what an energetic way to end the first day!

Participants using sticky notes to chose what to learn and what to teach

Kick starting day 2, with the sun and expectations so high was lessons from participants, and an ice breaker on the power of around leadership. This day was dedicated to Open Street Maps and Data Sprint on Funds meant for inputs in the health sector. Moving from scraping the data from the budget office to visualizing it, and creating a monitoring instrument amongst the participants. Working through the available health facility data for Goronyo on NMIS data, we found out that some Goronyo data were not accurate – So if we can’t use that, how do we get the government health facility data – most participants of this group concluded that the dhis2 data could be more reliable but its usage still remains difficult! Anyone wants to help in getting Goronyo health facility geo-referenced data? Please comment here. Not giving up, Sidney Bamidele of eHealth Africa trained participants on how to add, and edit points on open street maps and how to create task managers on HOTOSM.

Sidney Bamidele of eHealth Africa training participants on using Open Street Maps

Nevertheless, the data sprint with music, and drinks took the whole day, and I couldn’t stop hearing – OMG! So 20 million was budgeted for the construction of this health facility in my LGA, how come it is still at this state, I think we need to go and ask”; “I have found that so many time, descriptions of budget data has been duplicated – and how do we stop this”. As it has always been, only one sprinter had an apple laptop out of the 50 laptops on the tables; Most of the participants agreed that only 30% of Nigerians own a smart phone, so how many will used it, and how many will use an android or that new android app you are about to make? Maybe the feature of mobile activism in the country still lies in feature phones. These and many are conversations that always ensue during training and data sprint sessions I have facilitated. At the end what did we make – an Ushahidi Crowdmap instance of where funds for health input will go? a first step in starting a conversation around monitoring service delivery in that sector.

Participants during the Mapping and Data Sprint

What next? in the words of the Hamzat Lawal, the Chief Executive of Connected Development [CODE], it is important that we brace up, and start using the data on this platform in asking questions directed not only to the government on if budget data description got to citizens it was meant for, but also to citizens it was meant for – on facility and health input usage and quality. As a School of Data Fellow, I have learnt that citizens need basic tools and skills to hold government accountable. As a monitoring and evaluation expert, I can see that in few years, lots of data will be released (even though most wouldn’t be responsible), but how citizens will identify and use the reliable ones remain a herculean task. As a human being, I learned how hardworking and brave my colleagues and participants are. At no time did I feel that facilitating data trainings was futile. Ultimately, what I really learned about data, or open data, or available data is that the NGOs, journalist, activist and governments still need more capacity building around this phenomenon.

Pictures from This event are on Flickr

Tags: Mapping Comments Off on Instigating the Rise of Demand for Data: The #OpenData Party in Abuja

The World Tweets Nelson Mandela’s Death

Ali Rebaie - December 10, 2013 in Data Stories

Click here to see the interactive version of the map above

Data visualization is awesome! However, it conveys its goal when it tells a story. This weekend, Mandela’s death dominated the Twitter world and hashtags mentioning Mandela were trending worldwide. I decided to design a map that would show how people around the world tweeted the death of Nelson Mandela. First, I started collecting tweets associated with #RIPNelsonMandela using ScraperWiki. I collected approximately 250,000 tweets during the death day of Mandela. You can check this great recipe at school of data blog on how to extract and refine tweets.

After the step above, I refined the collected tweets and uploaded the data into CartoDB. It is one of my favorite open source mapping tools and I will make sure to write a CartoDB tutorial in future posts. I used the Bubble or proportional symbol map which is usually better for displaying raw data. Different areas had different tweeting rates and this reflected how different countries reacted. Countries like South Africa, UK, Spain, and Indonesia had higher tweeting rates. The diameter of the circles represents the number of retweets. With respect to colors, the darker they appeared, the higher the intensity of tweets is.

That’s not the whole story! Basically, it is easy to notice that some areas have high tweeting rates such as Indonesia and Spain. After researching about this topic, it was quite interesting to know that Mandela had a unique connection with Spain, one forged during two major sporting events. In 2010, Nelson Mandela was present in the stadium when Spain’s international football team won their first ever World Cup Football trophy as well. Moreover, for Indonesians, Mandela has always been a source of joy and pride, especially as he was fond of batik and often wore it, even in his international appearances.

Nonetheless, it was evident that interesting insights can be explored and such data visualizations can help us show the big picture. It also highlight events and facts that we are not aware of in the traditional context.

Tags: Mapping Comments Off on The World Tweets Nelson Mandela’s Death

Data Roundup, 3 December

Marco Menchinella - December 3, 2013 in Data Roundup

A course on online mapping, new visualization software, corruption perceptions data, bushfires in Australia through interactive maps, climate change effects infographics, the top 5 tweets of November in data visualization, a gift list for data lovers.

United Nations Photo – Climate Change Effects in Island Nation of Kiribati

Tools, Events, Courses

If you are a wannabe mapper and you need to acquire skills to manage your digital exploration tools you might be interested in registering at the “Online mapping for beginner” course of CartoDB starting on December the 3^rd. Hurry up: only few places left!

Daniel Smilkov, Deepak Jagdish and César Hidalgo are three MIT students that developed a visualization tool called Immersion. Immersion helps you visualizing your network of e-mail contacts using only “From”, “To”, “Cc” without taking into account any kind of content.

JavaScript is one of the most common programming language frequently used to create beautiful visualizations. Follow this tutorial from dry.ly if you want to learn it bypassing D3.js. Practice makes perfect!

Data Stories

Yesterday, Transparency International launched it’s annual Corruption Perceptions Index (CPI) ranking countries according to perceived levels of corruption. Have a look at the results and see how your country ranks.

Everyone knows what a bar chart is but have you ever heard about trilinear plots? This post from Alberto Cairo introduces a short consideration on new forms of data representations and on when to break conventions in information design.

The goal of the Digital Methods Initiative of the Amsterdam University is to map and analyze causes, effects and possible future scenarios deriving from climate change. As part of this project, the students from Density Design Research Lab wrote a wonderful post outlining their visual design take on climate change.

Gender inequality is one of those big issues which varies enormously from country to country. If you are wondering what countries have the worst gender gap a look at the map published on the Slate Magazine by Jamie Zimmerman, Nicole Tosh, and Nick McClellan.

There are a lot of visualizations you can make from data coming from social networks, especially from those coming from the biggest one: Facebook. Take a minute to see those posted in this curious article from Armin Grossenbacher: “Visualising (not so) Big Data”.

In Australia bushfires occur frequently. Look at the amazing interactive story that The Guardian published on their history, showing maps with data on temperatures, hectares of land burnt and number of properties damaged.

Not everyone knows that we just passed the World Aids Day on the first of December. Tariq Khokhar reminds us the global situation of the disease in this article from the World Bank Data Blog.

Data Sources

Datavisualization.fr extracted the list of the 5 most influential tweets of November containing the hashtag #dataviz from a database of about 10.100 posts. Read it here and see who did best.

Christimas is getting closer. If you need some good suggestions on what to buy to your friends and parents take a moment to read the FlowingData Gift Guide and you’ll find some interesting data-gifts for your data-lovers.

Tags: Australia, Gender Data, Mapping Comments Off on Data Roundup, 3 December

Learning Lunches

Neil Ashton - November 28, 2013 in

Photo credit: Rachel Andrew

Noah Veltman, OpenNews Fellow at the BBC, wrote a series of Learning Lunches and generously allowed School of Data to republish them.

These Learning Lunches are series of tutorials that aim to bridge the context gap and to demystify technical topics that come up often in newsroom development. They are a high-level but concrete discussion of technologies intended to foster more productive collaboration among developers, journalists, and designers. They involve as little code as possible but provide ample guidance for those who want to learn more.

Tags: Mapping, Scraping, SQL Comments Off on Learning Lunches

Netneutralitymap.org – converting 700Gb of data to a map.

Michael Bauer - September 6, 2012 in Data Stories

In this post, Michael Bauer explains how he wrangled data from measurement lab in order to create this visualisation of netneutrality, which maps interference with end user internet access.

Netneutrality describes the principle that data packets should be transported as best as possible over the internet without discrimination as to their origin, application or content. Whether or not we need legal measures to protect this principle is currently under discussion in the European Union and in the US.

The European Commission have frequently stated that there is no data to demonstrate the need for regulation. However, this data does exist: The measurement-lab initiative has collected large amounts of data from tests designed to detect interference with end user internet access. I decided to explore whether we can use some of the data and visualize it. The result? netneutralitymap.org.

Data acquisition: Google Storage

Measurement lab releases all their data with CC-0 licenses. This allows other researchers and curious people to play with the data. The datasets are stored as archives on Google storage. Google has created a set of utilities to retrieve them: gsutils. If you are curious about how to use it, look at the recipe in the school of data handbook.

All the data that I needed was in the archives – alongside a lot of data that I didn’t need. So I ended up downloading 2Gb data for a single day of tests, and actually only using a few megabytes of it.

Data Wrangling: Design decisions, parsing and storage

Having a concept of how to do things when you are starting out always pays. I wanted to keep a slim backend for the data and do the visualizations in browser. This pretty much reduced my options to either having a good API backend or JSON. Since I intend to update the data once a day, I only need to analyze and generate the data for the frontend once a day. So I decided to produce static json files using a specific toolchain.

For the toolchain I chose python and postgres SQL. Python for parsing, merging etc. and postgres for storage and analysis. Using SQL based databases for analysis pays off as soon as you get a lot of data. I expected a lot. SQL is considered to be slow, but a lot faster than python.

The first thing to do was parsing: the test I selected was glasnost. It is a testsuite emulating different protocols to transfer data and trying to detect whether these protocols are shaped. Glasnost stores very verbose logfiles. The logfiles state the results in nicely human readable format: So I had to write a custom parser to do this. There are many ways of writing parsers – I recently decided to use a more functional style and do it using pythons itertools and treating the file as a stream. The parser simply fills up the SQL database. But there is one more function – since we want to be able to distinguish countries, the parser also looks up the country belonging to the IP of the test using pygeoip and the geolite geoip database.

Once the table was filled, I wanted to know which providers the IPs of the test clients belonged to. So I added an additional table and started to look up ASNs. Autonomous System Numbers are numbers assigned to a block of Internet addresses, which tell us who currently owns the block. To look them up I used a python module cymruwhois (which queries whois.cymru.com for information). Once this step was complete, I had all the data I needed.

Analysis: Postgres for the win!

Once all the data is assembled, analysis needs to be done. The glasnost team previously used a quite simple metric to determine whether interference was present or not. I decided I use the same one. I created some new tables in postgres and started working on the queries. A good strategy is to do this iteratively – figure your subqueries out and then join them together in a larger query. This way things like:

INSERT INTO client_results
SELECT id,ip,asn,time,cc,summary.stest,min(rate) FROM client INNER JOIN
(SELECT test.client_id,test.test,max(rxrate)/max(txrate) AS
rate,mode,test.test AS stest FROM result,test WHERE test_id=test.id
GROUP BY test.client_id,mode,test.port,test.test HAVING max(txrate)>0) 
summary ON summary.client_id=id WHERE id NOT IN (SELECT id FROM client_results)
GROUP BY client.id,client.asn,client.time,client.cc,client.ip,summary.stest;

don’t seem too scary. In the map I wanted to show the precentage of tests in which intereference with a user’s internet connection took place, both by country and by provider. The total number of tests, the number of tests in which interference was detected, and the precentage of ‘interfered with tests’ are thus calculated for each country and for each provider. The Glasnost suite offers multiple tests for different applications. To do this the results are then further broken down by applications. Since this is run once a day I didn’t worry too much about performance. With all the data, calculating the results takes a couple of minutes – so no realtime queries here.

The next step is to simply dump the result as json files. I used python’s json module for this and it turned out to work beautifully.

Visualization: Jvectormap

For visualization I imagined a simple choropleth map with color-coded values of interference. I looked around how to do it. Using openstreetmap/leaflet seemed too cumbersome, but on the way I stumbled across jvectormap – a jquery based map plugin. I decided to use it. It simply takes data in the form of {“countrycode”:value} to display it. It also takes care of color-coding etc. A detailed recipe on how to create similar maps can be found in the School of Data Handbook. Once I had the basic functionality down e.g. display country details when a country is clicked, it was time to call in my friends. Getting feedback early helps in developing something like the map. One of my friends is a decent web designer – so he looked at it and immediately replaced my functional design with something much nicer.

Things I’ve learned creating the project:

SQL is amazing! Creating queries by iteration eases things down and results in mindboggling queries
Displaying data on maps is actually easy (this is the first project I did so).
700 Gb of data is a logistical challenge (I couldn’t have done it from home, thanks to friends at “netzfreiheit.org” for giving me access to their server)

If you’re interested in the details: check out the github repository.

Michael Bauer is a new datawrangler with the OKFN. He’ll be around schoolofdata.org and okfn labs. In his free time he likes to wrangle data from advocacy projects and research.

Tags: Mapping, Python, SQL Comments Off on Netneutralitymap.org – converting 700Gb of data to a map.