Skip to content Skip to footer

Data Glossary

AnonymisationThe process of treating data such that it cannot be used for the identification of individuals.
API (Application Programming Interface)A way computer programmes talk to one another. Can be understood in terms of how a programmer sends instructions between programmes.
Attribution LicenceA licence that requires attributing the original source of the licensed material.
BitTorrentBitTorrent is a protocol for distributing the bandwith for transferring very large files between the computers which are participating in the transfer. Rather than downloading a file from a specific source, BitTorrent allows peers to download from each other.
Boolean logicA form of algebra in which all values are reduced to either TRUE or FALSE.
Categorical DataData that helps put things into categories. E.g.: Country names, Groups, Conditions, Tags
Choropleth MapA choropleth map is a map where value are encoded onto regions using colormapping. The whole region is colored using the underlying value.
Continuous DataNumerical data that, if you plot all possible values, has no gaps. E.g. Sizes (you can be 155.55 or 155.56cm tall etc.) Compare to Discrete Data
CrowdsourcingMashup of crowd and outsourcing: Having a lot of people do simple tasks to complete the whole work.
CSV (Comma-separated Values)Comma Separated Values. A very simple, open format for tabular data which can be exported and imported by all spreadsheet applications and is easily manipulable with command line tools.
curlA command line tool for transferring data to and from online systems over standard internet protocols including FTP and HTTP. Very powerful and great for working with Web API s from the command line. See curl.haxx.se
DAP (Data Access Protocol)A system that allows outsiders to be granted access to databases without overloading either system.
Discrete DataNumerical Data that, if you plot all possible values, has gaps in it. E.g. the count of things (there are no 1.5 children). Compare to Continuous Data
etherpadA piece of software for collaborative real-time editing of text. See etherpad.org
GDPGross domestic product (GDP) is the market value of all officially recognized goods and services produced within a country in a given period of time. GDP per capita is often considered an indicator of a country’s standard of living. (Source: Wikipedia.)
GeocodingFrom Geographical Coding. Describes the practice of attaching geographical coordinates to items.
GeoJSONFormats that are machine-readable are ones which are able to have their data extracted by computer programs easily. PDF documents are not machine readable. Computers can display the text nicely, but have great difficulty understanding the context that surrounds the text. Common machine-readable file formats are CSV and Excel Files.
Intellectual property rightsMonopolies granted to individuals for intellectual creations.
JSONJavaScript Object Notation. A common format to exchange data. Although it is derived from Javascript, libraries to parse JSON data exist for many programming languages. Its compact style and ease of use has made it widespread. To make viewing JSON in a browser easier you can install a plugin such as JSONView in Chrome and JSONView in Firefox.
Machine-readableFormats that are machine-readable are ones which are able to have their data extracted by computer programs easily. PDF documents are not machine-readable. Computers can display the text nicely, but have great difficulty understanding the context that surrounds the text. Common machine-readable file formats are CSV and Excel Files.
MeanThe arithmetic mean of a set of values. Calculated by summing up all values and then dividing by the number of values.
MedianThe median is defined as the value where 50% of values in a range will be below, 50% of values above the value.
Normal DistributionThe normal (or Gaussian) distribution is a continuous probability distribution with a bell shaped curve.
Open DataOpen data is data that can be used, reused and redistributed freely by anyone for any purpose. More details can be found at opendefinition.org.
Open standardsGenerally understood as technical standards which are free from licencing restrictions. Can also be interpreted to mean standards which are developed in a vendor-neutral manner.
PercentilesPercentiles are a value where n% of values are below in a given range. e.g. the 5th percentile: 5 percent of values are lower than this value.
Public domainNo copyright exists over the work. Does not exist in all jurisdictions.
Qualitative DataQualitative data is data telling you something about qualities: e.g. description, colors etc. Interviews count as qualitative data.
Quantitative DataQuantitative data tells you something about a measure or quantification. Such as the quantity of things you have, the size (if measured) etc.
QuartilesQuartiles are the values where 25, 50 and 75% of values in a range are below the given value.
ReadmeA file (usually named README or README.txt) that explains new users what the current directory or set of files is about. This is very commonly found in open source software projects and is considered good practice to be included with various publications (including datasets). The file usually contains a short description of what to expect.
ScrapingThe process of extracting data in machine-readable formats of non-pure data sources e.g.: webpages or PDF documents. Often prefixed with the source (web-scraping PDF-scraping).
Share-alike LicenceA licence that requires users of a work to provide the content under the same or similar conditions as the original.
Tab-separated valuesTab-separated values (TSV) are a very common form of text file format for sharing tabular data. The format is extremely simple and highly machine-readable.
TaxonomyClassification. Taxonomy refers to hierarchical classification of things. One of the best known is the Linnean classification of species – still used today to classify all living beings.
Web APIAn API that is designed to work over the Internet.