Glossary

Anonymisation
The process of treating data such that it cannot be used for the identification of individuals.
API
See Application Programming Interface.
Application Programming Interface
A way computer programmes talk to one another. Can be understood in terms of how a programmer sends instructions between programmes.
Attribution Licence
A licence that requires attributing the original source of the licensed material.
BitTorrent
BitTorrent is a protocol for distributing the bandwith for transferring very large files between the computers which are participating in the transfer. Rather than downloading a file from a specific source, BitTorrent allows peers to download from each other.
Boolean logic
A form of algebra in which all values are reduced to either TRUE or FALSE.
Categorical Data
Data that helps put things into categories. E.g.: Country names, Groups, Conditions, Tags
Choropleth Map
A choropleth map is a map where value are encoded onto regions using colormapping. The whole region is colored using the underlying value.
Comma-separated Values
See CSV
Continuous Data
Numerical data that, if you plot all possible values, has no gaps. E.g. Sizes (you can be 155.55 or 155.56cm tall etc.) Compare to Discrete Data
Crowdsourcing
Mashup of crowd and outsourcing: Having a lot of people do simple tasks to complete the whole work.
CSV
Comma Separated Values. A very simple, open format for tabular data which can be exported and imported by all spreadsheet applications and is easily manipulable with command line tools.
curl
http://curl.haxx.se/ – a command line tool for transferring data to and from online systems over standard internet protocols including FTP and HTTP. Very powerful and great for working with Web API s from the command line.
DAP
See Data Access Protocol.
Data Access Protocol
A system that allows outsiders to be granted access to databases without overloading either system.
Discrete Data
Numerical Data that, if you plot all possible values, has gaps in it. E.g. the count of things (there are no 1.5 children). Compare to Continuous Data
etherpad
A piece of software for collaborative real-time editing of text. See http://etherpad.org/.
GDP
Gross domestic product (GDP) is the market value of all officially recognized goods and services produced within a country in a given period of time. GDP per capita is often considered an indicator of a country’s standard of living. (Source: Wikipedia.)
Geocode
see Geocoding
Geocoding
From Geographical Coding. Describes the practice of attaching geographical coordinates to items.
GeoJSON
GeoJSON is a format for encoding a variety of geographic data structures. It is based on the JSON specification. More documentation can be found on http://www.geojson.org
Intellectual property rights
Monopolies granted to individuals for intellectual creations.
IP rights
See Intellectual property rights.
JSON
JavaScript Object Notation. A common format to exchange data. Although it is derived from Javascript, libraries to parse JSON data exist for many programming languages. Its compact style and ease of use has made it widespread. To make viewing JSON in a browser easier you can install a plugin such as JSONView in Chrome and JSONView in Firefox.
Machine-readable
Formats that are machine readable are ones which are able to have their data extracted by computer programs easily. PDF documents are not machine readable. Computers can display the text nicely, but have great difficulty understanding the context that surrounds the text. Common machine-readable file formats are CSV and Excel Files.
Mean
The arithmetic mean of a set of values. Calculated by summing up all values and then dividing by the number of values.
Median
The median is defined as the value where 50% of values in a range will be below, 50% of values above the value.
Normal Distribution
The normal (or Gaussian) distribution is a continuous probability distribution with a bell shaped curve.
Open Data
Open data is data that can be used, reused and redistributed freely by anyone for any purpose. More details can be found at at opendefinition.org.
Open standards
Generally understood as technical standards which are free from licencing restrictions. Can also be interpreted to mean standards which are developed in a vendor-neutral manner.
Percentiles
Percentiles are a value where n% of values are below in a given range. e.g. the 5th percentile: 5 percent of values are lower than this value.
Public domain
No copyright exists over the work. Does not exist in all jurisdictions.
Qualitative Data
Qualitative data is data telling you something about qualities: e.g. description, colors etc. Interviews count as qualitative data
Quantitative Data
Quantitative data tells you something about a measure or quantification. Such as the quantity of things you have, the size (if measured) etc.
Quartiles
Quartiles are the values where 25, 50 and 75% of values in a range are below the given value.
Readme
A file (usually named README or README.txt) that explains new users what the current directory or set of files is about. This is very commonly found in open source software projects and is considered good practice to be included with various publications (including datasets). The file usually contains a short description of what to expect.
Scraping
The process of extracting data in machine-readable formats of non-pure data sources e.g.: webpages or PDF documents. Often prefixed with the source (web-scraping PDF-scraping).
Share-alike Licence
A licence that requires users of a work to provide the content under the same or similar conditions as the original.
Tab-separated values
Tab-separated values (TSV) are a very common form of text file format for sharing tabular data. The format is extremely simple and highly machine-readable.
Taxonomy
Classification. Taxonomy refers to hierarchical classification of things. One of the best known is the Linnean classification of species – still used today to classify all living beings.
Web API
An API that is designed to work over the Internet.

Last updated on Sep 02, 2013.