Exploring IATI funders in Kenya, Part II – cleaning & visualizing the data
Welcome back to a brief exploration of who funds whom in Kenya based on freely available data from IATI (International Aid Transparency Initiative).
In the last post, we extracted data from IATI data using Python. In this post, we’ll clean that data up and visualize it as a network graph to see what it can tell us about aid funding in Kenya.
If you couldn’t follow the code in the last post, don’t worry: we’ll use the GUI tools OpenRefine and Gephi from now on. You can download the result of last post’s data extraction here.
Data cleaning: OpenRefine
First, let’s clean up the data using Refine. Start Refine and then create a new project with the data file. The first things we’ll do are to clean up both columns and to bring the entries to a common case – I prefer titlecase. We do this with the “Edit cells -> Common transforms” functions “To titlecase” and “Trim leading and trailing whitespaces”:
We do this for both columns. Next we want to make sure that the Ministry of Finance is cited consistently in the data. For this, we first expand all mentions of “Min.” to “Ministry” using a transform (“Edit cells -> Transform…”):
We’ll also do the same for “Off.” and “Office”. Now let’s use the Refine cluster feature to try to automatically cluster entries that belong together.
We create a text facet using the “Facet -> Text” facet option on the Implementers Column. Next, click on the “cluster” button in the upper right. We do this for both columns. (If you’re not sure how to use this feature, check out our Cleaning Data with Refine recipe.)
As a last step, we’ll need to get rid of whitespace, as it tends to confuse Gephi when we import. We do this by replacing all spaces with underlines:
Perfect. Now we can export the data to CSV and load it into Gephi.
Network exploration: Gephi
Start Gephi and select “New Project”, then Open the CSV file. For some reason, Gephi doesn’t handle captions very well, so you’ll have to switch to “Data Laboratory” and remove the “Funder” and “Implementer” nodes.
Now switch back to “Overview”. Time to do some analysis!
Let’s first turn the labels on. Do this by clicking the “T” icon on the bottom:
Whoa – now it’s hard to read anything. Let’s do some layouting. Two layouts I’ve found work great in combination are ForceAtlas 2 and Fuchterman Reingold. Let’s apply them both. (Don’t forget to click “Stop” when the layout starts looking good.)
Great! After applying both algorithms, your graph should look similar to the picture below:
OK, now let’s highlight the bigger funders and implementers. We can do this with the text-size adjustment up top:
Great – but the difference seems to be too stark. We can change this with the “Spline…” setting:
OK, now let’s get the labels apart. There is a label-adjust layout we’ll use. Run this for a while. Now our graph looks like this:
Let’s get some color in. I like the “Modularity” statistic – this will colour nodes that are close to each other similarly.
Next, colour the text by “Modularity Class”.
Finally, we change the background colour to make the colours visible nicely.
Now that we’ve done this, let’s export the graph. Go to the “Preview” settings. You’ll quickly note that the graph looks very different. To fix this, try different settings and strategies, switching between “overview” and “preview” until you find a result you’re happy with. Here’s an example of what you can come up with:
What we can clearly see is that some of the funders tend to operate in very different spaces. Look at CAFOD (a Catholic development organization) on the right, or the cluster of USA and UN, WFP and European Commission at the top.
Now you’re equipped with the basics of how to use Gephi for exploring networks – go ahead! Is there something interesting you find? Let us know!