Introduction
Good documentation on data provenance (the origin and history of a dataset) can be compared to the chain of custody which is maintained for criminal investigations: each previous owner of a dataset must be identified, and they are held accountable for the processing and cleaning operations they have performed on the data. For Excel spreadsheets this would include writing down the steps taken in transforming the data, while advanced data tools (such as Open Refine, formerly Google Refine), often provide methods of exporting machine-readable data containing processing history. Any programs that have been written to process the data should be available when users access your end result and shared as open-source code on a public code sharing site such as GitHub.
