Keeping the data around

As you retrieve data from the government (or other sources), it’s easy to just consider the websites it has been released on as a permanent resource. Still, experience has shown that data does go away: whether it is through government re-designing its web sites, new policies that retract transparency rules or simple system failures.

At the same time, downloading complete copies of web sites – a process called mirroring – is a fairly well-established technique that can easily be deployed by civil society organisations. Mirroring involves an automated computer program (for a list see: http://en.wikipedia.org/wiki/Web_crawler) harvesting all the web pages from a specified web page, e.g. a ministry home page. In most cases, it is also possible to find old versions of web sites via the Internet Archive’s Wayback machine (http://archive.org/web/web.php), a project that aims to create up-to-date copies of all public web sites and archive them forever

… raw:: html

<div class=”alert alert-info”>Any questions? Got stuck? <a class=”btn btn-large btn-info” href=”http://ask.schoolofdata.org“>Ask School of Data! </a></div>

Last updated on Sep 02, 2013.