WhatDoTheyKnow Team Urge Caution When Using Excel to Depersonalise Data
[Guest Cross-post: from Myfanway Nixon of mySociety. You can learn more about her on her blog. The original post can be found here. Thanks for sharing!]
mySociety’s Freedom of Information website WhatDoTheyKnow is used to make around 15 to 20% of FOI requests to central government departments and in total over 160,000 FOI requests have been made via the site.
Occasionally, in a very small fraction of cases, public bodies accidentally release information in response to a FOI request which they intended to withhold. This has been happening for some time and there have been various ways in which public bodies have made errors. We have recently, though, come across a type of mistake public bodies have been making which we find particularly concerning as it has been leading to large accidental releases of personal information.
What we believe happens is that when officers within public bodies attempt to prepare information for release using Microsoft Excel, they import personally identifiable information and an attempt is made to summarise it in anonymous form, often using pivot tables or charts.
What those working in public bodies have been failing to appreciate is that while they may have hidden the original source data from their view, once they have produced a summary it is often still present in the Excel workbook and can easily be accessed. When pivot tables are used, a cached copy of the data will remain, even when the source data appears to have been deleted from the workbook.
When we say the information can easily be accessed, we don’t mean by a computing genius but that it can be accessed by a regular user of Excel.
We have seen a variety of public bodies, including councils, the police, and parts of the NHS, accidentally release personal information in this way. While the problem is clearly the responsibility of the public bodies, it does concern us because some of the material ends up on our website (it often ends up on public bodies’ own FOI disclosure logs too).
We strive to run the WhatDoTheyKnow.com website in a responsible manner and promptly take down inappropriately released personal information from our website when our attention is drawn to it. There’s a button on every request thread for reporting it to the site’s administrators.
As well as publishing this blog post in an effort to alert public bodies to the problem, and encourage them to tighten up their procedures, we’ve previously drawn attention to the issue of data in “hidden” tabs on Excel spreadsheets in our statement following an accidental release by Islington council; one of our volunteers has raised the issue at a training event for police FOI officers, and we’ve also been in direct contact with the Information Commissioner’s office both in relation to specific cases, and trying to help them understand the extent of the problem more generally.
Some of our suggestions:
- Don’t release Excel pivot tables created from spreadsheets containing personal information, as the source data is likely to be still present in the Excel file.
- Ensure those within an organisation who are responsible for anonymising data for release have the technical competence to fulfil their roles.
- Check the file sizes. If a file is a lot bigger than it ought to be, it could be that there are thousands of rows of data still present in it that you don’t want to release.
- Consider preparing information in a plain text format, eg. CSV, so you can review the contents of the file before release.