On the Need for Corporate Identifiers
Earlier this year, following the Bangladesh factory fire and subsequent building collapse, the School of Data ran a data expedition around Bangladeshi garment factory data (for example: Data Expedition story: Why garment retailers need to do more in Bangladesh; “critique”). A second, follow on data expedition will take place later this month: Online Data Expedition: Investigate the Garment Factories, October 18-20. To support this expedition, we hope to have a few more datasets to play with, including additional supplier lists, names of factories used by companies who have signed up to the Accord on Fire and Building Safety in Bangladesh, and a list of member companies of the Bangladesh Garment Manufacturers and Exporters Association (BGMEA).
Whilst the details of the investigations to be pursued as part of the data expedition are still to be decided on, the release of the list of factories (scraped data here) identified by signatories to the Accord suggests one possibility: to what extent are Bangladeshi suppliers for particular brands on this list? A related question might ask: to what extent are members of the BGMEA also signatories to the Accord?
Generalising these two questions, we have a question of the form: to what extent of members of one list of companies also members of a second list?
As the School of Data post Finding Matching Items on Separate Lists – Bangladeshi Garment Factories describes, this may not be as simple a task as we might hope for it to be…
Here’s one example why: it is easy for us to recognise that the following variations on a fictional company name – The Company Limited; The Company Ltd.; THE COMPANY LTD – all represent the same company; and we might also recognise variations on the theme: TCL, T.C.L., T-C-L as referring to the same thing. As human readers, if one variant appears on one list and another variant on the second list, we are usually able to tell that the two variants refer to the same company.
But for a simple exact string matching lookup that we are likely to use in a software application or simple computer matching algorithm, where all the characters must match exactly (even if case is ignored), we will find that “The Company Limited” is NOT matched with “The Company Ltd”, and even the full stop in “The Company Ltd.” prevents that variant being matched with “The Company Ltd”.
In order to unambiguously match companies, then, we need a better way: a way that is not subject to the arbitrary (although possibly conventional) way one person may record a specific piece of information, such as a company name, compared to another. In other words, we need to start making more use of unambiguous, unique identifiers that we can all agree on as standing for a particular company or organisation.
In the case of companies, this might conveniently be the unique company registration number associated with a company in a particular jurisdiction. In order to decode such company numbers, or develop tools that allow the easy lookup of company numbers so that can be incorporated into datasets, we need ready access to company registers, which argues in favour of releasing these registers as open data. At a larger scale, initiatives such as OpenCorporates are seeking to develop a single database with an identifier for each corporate legal entity in the world built on top of such open data releases.
So how would this help us? In the case of the Bangladeshi garment factories, if the signatories to the Accord included a company registration number on the Accord, and the members of the BGMEA also listed their company registration number, we could then unambiguously say whether or not a signatory to the Accord corresponded with a particular member of the the BGMEA.
If the supplier/factory lists published by the brands listed company registration numbers of the factory operators, we could unambiguously compare supplier lists with signatories to the Accord. Civil society organisations have campaigned for greater transparency in this field for years. Recently we have however also seen for-profit companies recognise that they often do not know who is sewing their clothes. In other words if a global garment brand would like to ensure that their global network of 774 factories were all certified under agreements such as the Bangladesh Safety Accord, it is in fact incredibly difficult to monitor as long as companies do not have unique identifiers.
In each case, the publication of company identifiers would improve transparency, because there would be no ambiguity about which companies were being referred to. However, there might still be some issues in distinguishing between:
- the operator of a particular factory premises;
- a company that was manufacturing garments within a particular factory;
- the particular factory premises within which garment manufacturing was taking place.
As ever, once you start trying to pin down the data associated with a particular investigation, you need to very clear about what exactly the data refers to!
Nonetheless, the point can still be made that way companies are mentioned as such, it also makes sense to include a reference to their company registration number of possible, because this uniquely and unambiguously identifies a company in a way that just providing its name does not.