You are browsing the archive for Gephi.

The Data Journalism Bootcamp at AUB Lebanon

- January 29, 2015 in Events, Fellowship

Data love is spreading like never before. Unlike previous workshops we did in the MENA region, on the 18th of January 2015, we gave an intensive data journalism workshop at the American University of Beirut for four consecutive days in collaboration with Dr. Jad Melki, Director of media studiesilovedata program at AUB. The Data team at Data Aurora were really happy sharing this experience with students from different academic backgrounds, including media studies, engineering or business.

The workshop was mainly led by Ali Rebaie, a Senior School of Data fellow, and Bahia Halawi, a data scientist at Data Aurora, along with the data community team assistants; Zayna Ayyad, Noor Latif and Hsein Kassab. The aim of the workshop was to give the students an introduction to the world of open data and data journalism, in particular, through tutorials on open source tools and methods used in this field. Moreover, we wanted to put students on track regarding the use of data.AUBworkshop

On the first day, the students were introduced to data journalism, from a theoretical approach, in particular, the data pipeline which outlined the different phases in any data visualization project: find, get, verify, clean, analyze and present. After that, students were being technically involved in scraping and cleaning data using tools such as open refine and Tabula.

Day two was all about mapping, from mapping best practices to mapping formats and shapes. Students were first exposed to different types of maps and design styles that served the purpose of each map. Moreover, best mappings techniques and visualizations were emphasized to explain their relative serving purpose. Eventually, participants became able to differentiate between the dot maps and the choropleth maps as well as many others. Then they used twitter data that contained geolocations to contrast varying tweeting zones by placing these tweets at their origins on cartodb. Similarly, they created other maps using QGIS and Tilemill. The mapping exercises were really fun and students were very happy to create their own maps without a single line of code.

On the third day, Bahia gave a lecture on network analysis, some important mathematical notions needed for working with graphs as well as possible uses and case studies related to this field. Meanwhile, Ali was unveiling different open data portals to provide the students with more resources and data sets. After these topics were emphasized, a technical demonstration on the use of network analysis tool to analyze two topics wasworkshopaub performed. Students were analyzing climate change and later, the AUB media group on Facebook was also analyzed and we had its graph drawn. It was very cool to find out that one of the top influencers in that network was among the students taking the training. Students were also taught to do the same analysis for their own friends’ lists. Facebook data was being collected and the visualizations were being drawn in a network visualization tool.

After completing the interactive types of visualizations, the fourth day was about static ones, mainly, infographics. Each student had the chance to extract the information needed for an interesting topic to transform it into a visual piece.  Bahia was working around with students, teaching them how to refine the data so that it becomes simple and short, thus usable for building the infographic design. Later, Yousif, a senior creative designer at Data Aurora, trained the students on the use of Photoshop and illustrator, two of the tools commonly used by infographic designers. At the end of the session, each student submitted a well done infographic of which some are posted below.

After the workshop Zayna had small talks with the students to get their feedback and here she quoted some of their opinions:

“It should be a full course, the performance and content was good but at some point, some data journalism tools need to be more mature andStatic Infographics developed by the students at the workshop. user-friendly to reduce the time needed to create a story,” said Jad Melki, Director of media studies program at AUB, “it was great overall.”

“It’s really good but the technical parts need a lot of time. We learned about new apps. Mapping, definitely I will try to learn more about it,” said Carla Sertin, a media student.

“It was great we got introduced to new stuff. Mapping, I loved it and found it very useful for me,” said Ellen Francis, civil engineering student. “The workshop was a motivation for me to work more on this,” she added, “it would work as a one semester long course.”

Azza El Masri, a media student, is interested in doing MA in data journalism. “I like it I expected it to be a bit harder, I would prefer more advanced stuff in scraping,” she added.


Flattr this!

An Introduction to Mapping Company Networks Using Gephi and OpenCorporates, via OpenRefine

- November 15, 2013 in Uncategorized

As more and more information about beneficial company ownership is made public under open license terms, we are likely to see an increase in the investigative use of this sort of data.

But how do we even start to work with such data? One way is to try to start making sense of it by visualising the networks that reveal themselves as we start to learn that company A has subsidiaries B and C, and major shareholdings in companies D, E and F, and that those companies in turn have ownership relationships with other companies or each other.

But how can we go about visualising such networks?!

This walkthrough shows one way, using company network data downloaded from OpenCorporates using OpenRefine, and then visualised using Gephi, a cross-platform desktop application for visualising large network data sets: Mapping Corporate Networks – Intro (slide deck version).

The walkthrough also serves as a quick intro to the following data wrangling activities, and can be used as a quick tutorial to cover each of them.

  • how to hack a web address/URL to get data-as-data from a web page (doesn’t work everywhere, unfortunately;
  • how to get company ownerships network data out of OpenCorporates;
  • how to download JSON data and get it into a nice spreadsheet/tabular data format using OpenRefine;
  • how to filter a tabular data file to save just the columns you want;
  • a quick intro to using the Gephi netwrok visualisation tool;
  • how to visualise a simple date file containing a list of how companies connect using Gephi;

Download it here: Mapping Corporate Networks – Intro.

So if you’ve ever wondered how to download JSON data so you can load it into a spreadsheet, or how to visualise how two lists of things relate to each other using Gephi, give it a go… We’d love to hear any comments you have on the walkthrough too, (what you liked, what you didn’t, what’s missing, what’s superfluous, what worked well for you, what didn’t and most of all – what use you put to anything you learned from the tutorial!:-)

If you would like to learn more about working with company network data, see the School of Data blogpost Working With Company Data which links to additional resources.

Flattr this!

Data Roundup, 25 October

- October 25, 2013 in Data Roundup

The English Silicon Valley map, Little Data economics for the news industry, the New York Data Week and Strata Conference, an infographic on movies’ supercars, workshops and new databases.

Mike Leeorg – New York City Skyline Sunset

Tools, Events, Courses

Interested in joining and developing a data journalism project? Medialab Prado is looking for collaborators for its “Workshop on Data Journalism: Transforming Data into Stories”. Participants will work in groups to produce selected projects ranging from “Globalization and health trends” to “Climate Finance Maps”. Workshops take place on two editions: 25-27 October and 13-15 December. Hurry up! The deadline for registration is October 24.

If you are curious about the dimension of your Facebook network you may want to have a look at the first DataJLab video tutorial on Gephi. Gephi is platform that helps you visualizing complex series of relations and, above all, is available for free to anyone!

Next week every New Yorker should not miss the appointment with two of the biggest events on the world of data. On Monday 27th starts the NYC Data Week and, right the day after, the Strata Conference opens the doors to the public. It’s going to be an intensive agenda of workshops, speeches and meetups for anyone interested in analyzing and visualizing numbers and statistics: journalists, information architects, designers, entrepreneurs, start-uppers and many more.

Data Stories

The legendary Guardian Data Blog recently published an interesting analysis of the diversity of languages spoken in England. In “What does the 2011 Census tell us about diversity of languages in England and Wales?” the University College London geographer Guy Lansley, author of the article, displays the distribution of idioms in the Country through a series of dot maps based on data released by Office for National Statistics.

If you are wondering what kind of role data analysis and data intelligence play in big news industries nowadays then you should absolutely read Ken Doctor’s point of view on the Nieman Journalism Lab where he describes and presents “The newsonomics of Little Data”.

Want to know which is the English Silicon Valley? Read and explore John Burn-Murdoch’s map of Britain’s technology sector hotspots on Financial Times.

For those with a true passion for cars and movies Cool Infographics posted “Car of the Silver Screen”, a long nice-looking graph showing all the most famous characters’ supercars: from the legendary Sean Connery’s Aston Martin DB5 in “007 Goldfinger” to the most recent Audi R8 e-tron driven by Robert Downey Jr. in “Iron Man 3”.

Data Sources

Data journalists from La Nacion just released the beta version of Declaraciones Juradas Abiertas, a huge database listing assets, holdings and properties of Argentinian public servants aimed at increasing public administration transparency towards citizenship.

Flattr this!

Exploring IATI funders in Kenya, Part II – cleaning & visualizing the data

- August 22, 2013 in Uncategorized

Welcome back to a brief exploration of who funds whom in Kenya based on freely available data from IATI (International Aid Transparency Initiative).

In the last post, we extracted data from IATI data using Python. In this post, we’ll clean that data up and visualize it as a network graph to see what it can tell us about aid funding in Kenya.

If you couldn’t follow the code in the last post, don’t worry: we’ll use the GUI tools OpenRefine and Gephi from now on. You can download the result of last post’s data extraction here.

Data cleaning: OpenRefine

First, let’s clean up the data using Refine. Start Refine and then create a new project with the data file. The first things we’ll do are to clean up both columns and to bring the entries to a common case – I prefer titlecase. We do this with the “Edit cells -> Common transforms” functions “To titlecase” and “Trim leading and trailing whitespaces”:


We do this for both columns. Next we want to make sure that the Ministry of Finance is cited consistently in the data. For this, we first expand all mentions of “Min.” to “Ministry” using a transform (“Edit cells -> Transform…”):


We’ll also do the same for “Off.” and “Office”. Now let’s use the Refine cluster feature to try to automatically cluster entries that belong together.

We create a text facet using the “Facet -> Text” facet option on the Implementers Column. Next, click on the “cluster” button in the upper right. We do this for both columns. (If you’re not sure how to use this feature, check out our Cleaning Data with Refine recipe.)

As a last step, we’ll need to get rid of whitespace, as it tends to confuse Gephi when we import. We do this by replacing all spaces with underlines:


Perfect. Now we can export the data to CSV and load it into Gephi.

Network exploration: Gephi

Start Gephi and select “New Project”, then Open the CSV file. For some reason, Gephi doesn’t handle captions very well, so you’ll have to switch to “Data Laboratory” and remove the “Funder” and “Implementer” nodes.


Now switch back to “Overview”. Time to do some analysis!

Let’s first turn the labels on. Do this by clicking the “T” icon on the bottom:


Whoa – now it’s hard to read anything. Let’s do some layouting. Two layouts I’ve found work great in combination are ForceAtlas 2 and Fuchterman Reingold. Let’s apply them both. (Don’t forget to click “Stop” when the layout starts looking good.)


Great! After applying both algorithms, your graph should look similar to the picture below:


OK, now let’s highlight the bigger funders and implementers. We can do this with the text-size adjustment up top:


Great – but the difference seems to be too stark. We can change this with the “Spline…” setting:


OK, now let’s get the labels apart. There is a label-adjust layout we’ll use. Run this for a while. Now our graph looks like this:


Let’s get some color in. I like the “Modularity” statistic – this will colour nodes that are close to each other similarly.


Next, colour the text by “Modularity Class”.


Finally, we change the background colour to make the colours visible nicely.


Now that we’ve done this, let’s export the graph. Go to the “Preview” settings. You’ll quickly note that the graph looks very different. To fix this, try different settings and strategies, switching between “overview” and “preview” until you find a result you’re happy with. Here’s an example of what you can come up with:


What we can clearly see is that some of the funders tend to operate in very different spaces. Look at CAFOD (a Catholic development organization) on the right, or the cluster of USA and UN, WFP and European Commission at the top.

Now you’re equipped with the basics of how to use Gephi for exploring networks – go ahead! Is there something interesting you find? Let us know!

Flattr this!