Bad Data is a site detailing real-world examples of how not to prepare or provide data. It showcases poorly structured, misformatted, or just plain ugly datasets and what they get wrong.
While its primary purpose is to serve as an educational tool for governments and other organizations, it is also a good source of practice material for budding data wranglers—those tasked with cleaning and transforming data. The project began as a place to keep practice data for Data Explorer, another project of Open Knowledge Labs.
dpm (data package manager) is a library and
command-line tool for installing and
packages. Inspired by software package management tools
apt for Debian,
to reduce the friction of
sharing and working with data.
A FireFox extension which allows you to view, search, graph and map CSV files in the browser (built using Recline). This is an port of the great Rufus Pollock's chrome-csv-viewer on FireFox.
The Open Knowledge Labs website (i.e. the site you’re looking at right now) is itself a collaborative project of Open Knowledge. It is built using Jekyll, a static site generator, and hosted on GitHub Pages. You can contribute by addressing some of the items on the project’s GitHub issue queue (hint: you can start with the easy ones).
Head Start is intended for scholars who want to get an overview of a research field. They could be young PhDs getting into a new field, or established scholars who venture into a neighboring field. The idea is that you can see the main areas and papers in a field at a glance without having to do weeks of searching and reading. A prototypical implementation for the field of educational technology can be found on Mendeley Labs. The visualization is also used in Conference Navigator 3 and in the Organic Edunet portal.
A collection of the works of Ivo, bishop of Chartres.
This site has four elements
- draft texts and some concordances for the three collections traditionally
associated with Ivo of Chartres:
- the Collectio Tripartita
- the Decretum and
- the Panormia
- a draft list of manuscripts which contain a significant number of Ivo’s letters.
Originally developed as part of ReclineJS but now fully standalone.
Data Pipes is a service to provide streaming, "pipe-like" data transformations on the web – things like deleting rows or columns, find and replace, head, grep etc.
Reconcile-CSV is an OpenRefine reconciliation service running on top of a CSV file. It uses fuzzy matching to find the most likely candidates for matching. If you ever needed to join two datasets that didn't have unique identifiers and where things are written slightly different: this is a way to go.
Open Correspondence is an attempt to explore the letters network of the nineteenth century. At the moment, the project contains some of the letters of Charles Dickens, but we're working to expand to it include many other authors such as Jane Austen, George Eliot and Byron.
It also provides a backend interface to ElasticSearch suitable for use with the Recline suite of data libraries.
There’s too much friction working with data - friction getting data, friction processing data, friction sharing data.
This friction stops people doing stuff: stops them creating, sharing, collaborating, and using data - especially amongst more distributed communities. It kills the cycles of find, improve, share that would make for a dynamic, productive and attractive (open) data ecosystem.
We need to make an ecosystem that, like open-source for software, is useful and attractive to those without any principled interest, the vast majority who simply want the best tool for the job, the easiest route to their goal.
We think that by getting a few key pieces in place we can reduce friction enough to revolutionize how the (open) data ecosystem operates with massively improved data quality, utilization and sharing.
- Standards: A small set of lightweight ‘data package’ standards and patterns providing a base structure on which tooling and integration can build.
- Tooling and Integration: Making it easy to use and publish data packages from your existing apps and workflows whether that’s Excel, R, or Hadoop!
- Outreach and Community: Engaging and evangelizing around the concepts, standards and tooling and building a community of users and contributors.
The BubbleTree can be used to display hierarchical (spending) data in an interactive visualization. The setup is easy and independent from the OpenSpending platform. However, there is an optional integration module to connect with data from the OpenSpending API.
Nomenklatura de-duplicates and integrates different names for entities - people, organisations or public bodies - to help you clean up messy data and to find links between different datasets. The service will create references for all entities mentioned in a source dataset. It then helps you to define which of these entities are duplicates and what the canonical name for a given entity should be. This information is available in data cleaning tools like OpenRefine or in custom data processing scripts, so that you can automatically apply existing mappings in the future. The focus of nomenklatura is on data integration, it does not provide further functionality with regards to the people and organisations that it helps to keep track of.
Nomenklatura is a simple service that makes it easy to maintain a canonical list of entities such as persons, companies or event streets and to match messy input, such as their names against that canonical list – for example, matching Acme Widgets, Acme Widgets Inc and Acme Widgets Incorporated to the canonical "Acme Widgets".
With Nomenklatura its a matters of minutes to set up your own set of master data to match against and it provides a simple user interface and API which you can then use do matching (the API is compatible with Open Refine's reconciliation function).
Nomenklatura can not only store the master set of entities you want to match against but also will learn and record the various aliases for a given entity - such as a person, organisation or place - may have in various datasets.
CrowdCrafting is a free, open-source crowd-sourcing and micro-tasking platform powered by the PyBossa software. This platform enables people to create and run projects that utilize on-line assistance in performing tasks that require human cognition such as image classification, transcription, geocoding and more. CrowdCrafting is there to help researchers, civic hackers and developers to create projects where anyone around the world with some time, interest and an Internet connection can contribute.
Froide is a Freedom of Information portal written in Python using the Django Web framework. It manages contactable entities, requests and much more. Users can send emails to these entities and receive public answers via the platform.
It was developed to power Frag den Staat – the German Freedom of Information Portal, but is internationalized, localized and themable and has deployed in several different countries.
Germany Freedom of Information portal powered by the Froide platform. Responsible for managing over a 1/3 of all Freedom of Information request in Germany.
Visualization on violence against women in the brazilian state Rio Grande do Sul.
In a nutshell it is an open source platform for working with collections of texts. It enables students, researchers and teachers to share and collaborate around texts using a simple and intuitive interface.
Library for working with IATI data and converting it into a relational database. http://blog.okfn.org/2012/06/05/from-xml-to-visualisations-iati/
A chrome extension which allows you to view, search, graph and map CSV files in the browser (built using Recline)
Kartograph is a simple and lightweight framework for building interactive map applications without Google Maps or any other mapping service. It was created with the needs of designers and data journalists in mind.
Tools for parsing messy tabular data. http://okfnlabs.org/blog/2012/10/22/messytables.html
Python library and command line tool for converting data from one format to another. It builds on messytables, GDAL and many more great open-source libraries for processing data, and provides one easy to use standard API.
WikipediaJS is a simple JS library for accessing information in Wikipedia articles such as dates, places, abstracts etc. The library is the work of Labs member Rufus Pollock. In essence, it is a small wrapper around the data and APIs of the DBPedia project and it is they who have done all the heavy lifting of extracting structured data from Wikipedia - huge credit and thanks to DBPedia folks!
OffenesParlement is a site (and open-source codebase) for gathering and presenting information about the work of the Bundestag and Bundesrat.