An Introduction to Mapping Company Networks Using Gephi and OpenCorporates, via OpenRefine

November 15, 2013 in Infoskills, OpenRefine, recipe

As more and more information about beneficial company ownership is made public under open license terms, we are likely to see an increase in the investigative use of this sort of data.

But how do we even start to work with such data? One way is to try to start making sense of it by visualising the networks that reveal themselves as we start to learn that company A has subsidiaries B and C, and major shareholdings in companies D, E and F, and that those companies in turn have ownership relationships with other companies or each other.

But how can we go about visualising such networks?!

This walkthrough shows one way, using company network data downloaded from OpenCorporates using OpenRefine, and then visualised using Gephi, a cross-platform desktop application for visualising large network data sets: Mapping Corporate Networks – Intro (slide deck version).

The walkthrough also serves as a quick intro to the following data wrangling activities, and can be used as a quick tutorial to cover each of them.

  • how to hack a web address/URL to get data-as-data from a web page (doesn’t work everywhere, unfortunately;
  • how to get company ownerships network data out of OpenCorporates;
  • how to download JSON data and get it into a nice spreadsheet/tabular data format using OpenRefine;
  • how to filter a tabular data file to save just the columns you want;
  • a quick intro to using the Gephi netwrok visualisation tool;
  • how to visualise a simple date file containing a list of how companies connect using Gephi;

Download it here: Mapping Corporate Networks – Intro.

So if you’ve ever wondered how to download JSON data so you can load it into a spreadsheet, or how to visualise how two lists of things relate to each other using Gephi, give it a go… We’d love to hear any comments you have on the walkthrough too, (what you liked, what you didn’t, what’s missing, what’s superfluous, what worked well for you, what didn’t and most of all – what use you put to anything you learned from the tutorial!:-)

If you would like to learn more about working with company network data, see the School of Data blogpost Working With Company Data which links to additional resources.

flattr this!

  • Wodin Praxis

    One typo netwrok should be network.