Labs newsletter: Q1 2015

Welcome to the first Labs Newsletter of 2015! There has been some great activity around open data and tech in the Open Knowledge network over the first quarter of 2015. Let’s dive straight in!

Labs <3 Discourse

In case you don’t know, Discourse is an open source forum/mailing list hybrid for communities. Open Knowledge runs a Discourse server, and of course, there is a home there for the Open Knowledge Labs community. We hope to move community discussion there going forward, so, check out the Open Knowledge Labs category, signup, and set your digest preferences.

Labs hangouts

The first Open Knowledge Labs hangout for 2015 was held on April 16th to a full house, and the next one is currently scheduled for May 14. Checkout the previous agenda, and planning for the next one, here at okfnpad.

Core datasets

Core datasets is a project for collecting and maintaining important and commonly-used (“core”) datasets in high-quality, standardized and easy-to-use form. There has been quite some activity here, with a call for data curators (jump in if you are interested!). Currently, 35+ volunteers are contributing, with leadership from super contributor @sxren.

Most action takes place here with datasets then appearing on the frictionless data site.

Some notable recent contributions include:

Data Package libraries

Data Packages are a simple set of specifications for packaging data. Some great libraries have recently been released (and updated) for working with the Data Package format and related specs such as JSON Table Schema.

dpmr: Data Package management in R

dpmr is for working with Data Packages in R. Check it out here.

DataPak: Data Package management in Ruby

DataPak is for working with Data Packages in Ruby, and provides some really nice extras like managing your packages locally, SQL integration and more. Read the announcement on the Labs blog [here][datapack-announce], and check out the code here.

Data Package: Data Package management in Python

Data Package, and Budget Data Package, are Python packages for working with Data Packages. These libraries have been around for a while, but recently were updated to add Python 3 support. Check out Data Package here, and Budget Data Package here.

JTSKit: Working with JSON Table Schema in Python

JSON Table Schema is a specification for declaring schemas for data, and is used within Data Packages. JTSKit is a Python library for working with JSON Table Schema, providing interfaces for validating schemas, inferring schema from data, and a schema model class for easy use in Python code. Check it out here.

OCR PDF to Text

A new web service is available via Labs for converting documents (eg: PDF) to text using OCR. Read the announcement here, and check out the code here.


GoodTables is a web services (and Python Library/CLI) for validating tabular data. Read more about it in the announcement here, check out the web service here, and the library here.


ScraperWiki have released a new library for getting data out of spreadsheets. Read the announcement here, and check out the code here.

Council data visualisations and standards

Steve Bennett of Open Knowledge Australia has been doing some awesome work standardising and visualising council data in Victoria, Australia. He’s hoping to gain wider adoption of the standards that are emerging, in Australia and beyond. The standardisation work is happening here, on the OKFNAU repository on GitHub. See some of the data visualised on the Open Bin Map and Open Trees.

New data portal for Washington DC

Washington, DC’s data catalog has a new home. It operates on the ArcGIS Open Data platform and houses data relevant to city services in a variety of formats and with built-in APIs. The service is run out of the DC Office of the Chief Technology Officer, who have been quite responsive to issues and requests. You can give them a shout on Twitter as @opendatadc. Old datasets are still accessible here as they transition to the new site.

Remote data access wrapper for the Nomis API

Here’s an interesting blog post detailing work in Python/Pandas over the Nomis API, coming out of work Tony Hirst is doing teaching data wrangling for the UK Cabinet Office.

Get involved

Anyone can join the Labs community and get involved! Read more about how you can join the community and participate by coding, wrangling data, or doing outreach and engagement. Also check out the ideas page to see what’s cooking in the Labs, and the newsletter page if you have items to submit to the next newsletter.