Glossary
Open Data
Open data is data that can be used, reused and redistributed freely by anyone for any purpose. More details can be found at at opendefinition.org.
Machine-readable
Formats that are machine readable are ones which are able to have their data extracted by computer programs easily. PDF documents are not machine readable. Computers can display the text nicely, but have great difficulty understanding the context that surrounds the text. Common machine-readable file formats are CSV
and Excel Files.
Readme
A file (usually named README
or README.txt
) that explains new users what the current directory or set of files is about. This is very commonly found in open source software projects and is considered good practice to be included with various publications (including datasets).
The file usually contains a short description of what to expect.
BitTorrent
BitTorrent is a protocol for distributing the bandwith for transferring very large files between the computers which are participating in the transfer. Rather than downloading a file from a specific source, BitTorrent allows peers to download from each other.
JSON
JavaScript Object Notation. A common format to exchange data. Although it is derived from Javascript, libraries to parse JSON data exist for many programming languages. Its compact style and ease of use has made it widespread. To make viewing JSON in a browser easier you can install a plugin such as JSONView in Chrome and JSONView in Firefox.
GDP
Gross domestic product (GDP) is the market value of all officially recognized goods and services produced within a country in a given period of time. GDP per capita is often considered an indicator of a country’s standard of living. (Source: Wikipedia.)
GeoJSON
GeoJSON is a format for encoding a variety of geographic data structures. It is based on the :term:JSON
specification. More documentation can be found on http://www.geojson.org.
Geocoding
From Geographical Coding. Describes the practice of attaching geographical coordinates to items.
Geocode
see Geocoding
CSV
Comma Separated Values. A very simple, open format for tabular data which can be exported and imported by all spreadsheet applications and is easily manipulable with command line tools.
Comma-separated Values
See CSV
curl
http://curl.haxx.se/ - a command line tool for transferring data to and from online systems over standard internet protocols including FTP and HTTP. Very powerful and great for working with Web API
s from the command line.
DAP
See Data Access Protocol
.
Data Access Protocol
A system that allows outsiders to be granted access to databases without overloading either system.
etherpad
A piece of software for collaborative real-time editing of text. See http://etherpad.org/.
Attribution Licence
A licence that requires attributing the original source of the licensed material.
API
See Application Programming Interface
.
Application Programming Interface
A way computer programmes talk to one another. Can be understood in terms of how a programmer sends instructions between programmes.
Web API
An API
that is designed to work over the Internet.
Share-alike Licence
A licence that requires users of a work to provide the content under the same or similar conditions as the original.
Public domain
No copyright exists over the work. Does not exist in all jurisdictions.
Open standards
Generally understood as technical standards which are free from licencing restrictions. Can also be interpreted to mean standards which are developed in a vendor-neutral manner.
Anonymisation
The process of treating data such that it cannot be used for the identification of individuals.
IP rights
See Intellectual property rights
.
Intellectual property rights
Monopolies granted to individuals for intellectual creations.
Tab-separated values
Tab-separated values (TSV) are a very common form of text file format for sharing tabular data. The format is extremely simple and highly machine-readable
.
Taxonomy
Classification. Taxonomy refers to hierarchical classification of things. One of the best known is the Linnean classification of species - still used today to classify all living beings.
Qualitative Data
Qualitative data is data telling you something about qualities: e.g. description, colors etc. Interviews count as qualitative data
Quantitative Data
Quantitative data tells you something about a measure or quantification. Such as the quantity of things you have, the size (if measured) etc.
Crowdsourcing
Mashup of crowd and outsourcing: Having a lot of people do simple tasks to complete the whole work.
Choropleth Map
A choropleth map is a map where value are encoded onto regions using colormapping. The whole region is colored using the underlying value.
Mean
The arithmetic mean of a set of values. Calculated by summing up all values and then dividing by the number of values.
Normal Distribution
The normal (or Gaussian) distribution is a continuous probability distribution with a bell shaped curve.
Median
The median is defined as the value where 50% of values in a range will be below, 50% of values above the value.
Quartiles
Quartiles are the values where 25, 50 and 75% of values in a range are below the given value.
Percentiles
Percentiles are a value where n% of values are below in a given range. e.g. the 5th percentile: 5 percent of values are lower than this value.
Scraping
The process of extracting data in :term:machine-readable
formats of non-pure data sources e.g.: webpages or PDF documents. Often prefixed with the source (web-scraping PDF-scraping).
Categorical Data
Data that helps put things into categories. E.g.: Country names, Groups, Conditions, Tags
Discrete Data
Numerical Data that, if you plot all possible values, has gaps in it.
E.g. the count of things (there are no 1.5 children). Compare to Continuous Data
Continuous Data
Numerical data that, if you plot all possible values, has no gaps. E.g. Sizes (you can be 155.55 or 155.56cm tall etc.) Compare to Discrete Data
Boolean logic
A form of algebra in which all values are reduced to either TRUE
or FALSE
.
- Improve this page Edit on Github Help and instructions
-
Donate
If you have found this useful and would like to support our work please consider making a small donation.