Working With Company Data

October 31, 2013 in Events, HowTo

We all think we know what we mean by “a company”, such as the energy giants Shell or BP, but what is a company exactly? As OpenOil’s Amit Naresh explained in our OGP Workshop on “Working With Company Data” last week, the corporate structure of many multinational companies is a complex network of interconnected countries domiciled or registered in a wide variety of countries across the world in order to benefit from tax breaks and intricate financial dealings.

Given the structure of corporate networks can be so complex, how can we start to unpick and explore the data associated with company networks?

The following presentation – available here: School of Data: Company Networks – describes some of the ways in which we can start to map corporate networks using open company data published by OpenCorporates using OpenRefine.


We can also use OpenRefine to harvest data from OpenCorporates relating to the directors associated with a particular company or list of companies: School of Data: Grabbing Director Dara

A possible untapped route to harvesting company data is Wikipedia. The DBpedia project harvests structured data from Wikipedia and makes it available as a single, queryable Linked Data datasource. An example of the sorts of network that can be uncovered from independently maintained maintained Wikipedia pages is shown by this network that uncovers “influenced by” relationships between philosophers, as described on Wikipedia:

WIkipedia philosophers influence map

See Visualising Related Entries in Wikipedia Using Gephi and Mapping Related Musical Genres on Wikipedia/DBPedia With Gephi for examples of how to generate such maps directly from Wikipedia using the cross-platform Gephi application. For examples of the sorts of data available from DBpedia around companies, see:

Using Wikipedia – or otherwise hosted versions of the MediWiki application that Wikipedia sits on top of – there is great potential for using the power of the crowd to uncover the rich network of connections that exist between companies, if we can identify and agree on a set of descriptive relations that we can use consistently to structure data published via wiki pages…

Flattr this!