When A Government Minister’s Data Laundry is Hung Out to Dry…
A well targeted Freedom of Information request to the UK Department for Education and its consequent report hit the news here recently. It turns out that a claim by Minister for Education, Michael Gove, that “[s]urvey after survey has revealed disturbing historical ignorance” was largely based, either directly, or as a cynic might imagine, as a result of feverish, opportunistic retrofitting exercise (i.e. a post hoc rationale), on a series of PR related, media sponsored polls.
Fact checkers Full Fact also reported that a recent claim made by the Work and Pensions Secretary was “unsupported by official statistics published by [his own] Department” (Education surveys and benefit statistics: two government departments caught out in one week).
Both these examples demonstrate what I have started to think of as a particular form of data laundering. Although an ill-defined term, I’ve previously summarised several crude definitions as follows:
- “[o]bscuring, removing, or fabricating the provenance of illegally obtained data such that it may be used for lawful purposes” ([SectorPrivate’s] definition of Data Laundering – as inspired by William Gibson from Mona Lisa Overdrive).
- “Companies are paying a lot of money for personal and group profiles and there are market actors in position to sell them.
“This is clearly against data protection principles. This phenomenon is known as ‘data laundering’. Similar to money laundering, data laundering aims to make illegally obtained personal data look as if they were obtained legally, so that they can be used to target customers.” [ ThinkMind // International Journal On Advances in Security, volume 2, numbers 2 and 3, 2009 . See also: “Buying You: The Government’s Use of Fourth-Parties to Launder Data about ‘The People’”, Joshua L. Simmons (Kirkland & Ellis LLP), September 19, 2009, Columbia Business Law Review, Vol. 2009, No. 3, p. 950] - various definitions relating to the role of data cleansing;
- the replacement of metadata records or fields in library catalogues in particular that are tainted with commercial license restrictions with data of equivalent or higher quality, known provenance and open license terms;
- a process in which the original low quality origins of a dataset is masked by the provenance, authority or veneer of quality associated with another trusted agent, such that the data becomes accepted “at face value” with the imprimateur of that trusted party. The ‘wash and rinse” process can be repeated to give the data set or claim ever more weight with each restatement. (For an example, see Sleight of Hand and Data Laundering in Evidence Based Policy Making).
It is this final sense, of giving a weak claim a weightier basis, that Gove in particular appears to have acted: whilst the “fact” that UK schoolchildren woefully demonstrate historical ignorance is now claimed to have been sourced from dubious PR commissioned polls, the fact that could have been remembered was that a Minister had stated that this was true with the full implied backing of his Department. And it would be this “fact” that that might then end up being repeated in ever more formal policy setting situations and actually helping drive the adoption of a particular policy. (A famous example of how one source of information is assumed to have a different provenance to what is actually the case is demonstrated by the “dodgy dossier” in which “large parts of [a] British government … dossier on Iraq – allegedly based on “intelligence material” – were taken from published academic articles, some of them several years old.” On the question of polls, it’s worth remembering that they are often commissioned, and reported, in the context of different lobbying aims, and may even framed to make one preferred outcome more likely than an unfavoured outcome: Two can play at that game: When polls collide.)
Note that this form of data misuse is different to the recent case of the Reinhart-Rogoff academic paper that was shown by a student replication project to include errors, frame the data in a particular way, and make a hardline point around an arbitrary threshold value. In the “R & R” case, academic evidence was used to support a particular policy decision, but then the evidence was found to contain errors that could be argued affected the validity of its claims, the very claims that supported the adoption of one particular policy over another.
As data finds its way in to ever more news reports and official reports, it may at times be worth treating it as “hearsay”, rather than demonstrated fact”, if you can’t get clear information about how, when and by whom the data was originally collected and analysed. In other words, you may at times need to follow the data (h/t Paul Bradshaw).
PS I am also reminded of the phrase zombie statistic to describe those numbers that get quoted in report after report, that never seem to die no matter how often they are contested, and whose provenance is obscured by the mists of time. The full extent of the relationship between zombie statistics and data laundering is left as a question for further research… or the comments below…;-)