The Data Wrangling Blog
-
This is the first of regular updates on Labs project http://data.okfn.org/ and summarizes some of the changes and improvements over the last few weeks. 1. Refactor of site layout and focus. We've done a refactor of the site to have stronger focus on the data....
-
Our next Open Humanities Hangout will take place next Tuesday, 28th May. This is the latest in the series of regular hangouts we've been organizing over the past few months with people interested in tapping in to the growing amount of open cultural data and...
-
Nomenklatura is a simple service that makes it easy to maintain a canonical list of entities such as persons, companies or event streets and to match messy input, such as their names against that canonical list – for example, matching Acme Widgets, Acme Widgets Inc...
-
This is an update on PublicBodies.org - a Labs project whose aim is to provide a "URL for every part of Government": http://publicbodies.org/ PublicBodies.org is a database and website of "Public Bodies" – that is Government-run or controlled organizations (which may or may not have...
-
I'm playing around with some large(ish) CSV files as part of a OpenSpending related data investigation to look at UK government spending last year -- example question: which companies were the top 10 recipients of government money? (More details can be found in ">this issue...
-
sqlaload is a small library that I use to handle databases in Python data processing. In many projects, your process starts with very messy data (something you've scraped or loaded from a hand-prepared Excel sheet). In subsequent stages, you gradually add cleaned values in new...
-
At the Culture Labs hangout yesterday we wrote up the plans for the next steps for Textus we have been discussing over the last few months. The result is this slide deck overview. It both introduces Textus and outlines next steps (slide 12 onwards). Key...
-
This is an update on progress with the Data Explorer (aka Data Transformer). Progress is best seen from this demo which takes you on a tour of house prices and the difference between real and nominal values. More information on recent developments can be found...
-
Over time Recline JS has grown. In particular, since the first public announce of Recline last summer we've had several people producing new backends and views (e.g. backends for Couch, a view for d3, a map view based on Ordnance Survey's tiles etc etc). As...
-
I'm really happy to announce that today we have finally added a feature that will allow to export your data into a CSV format with just one click (we also support the same for JSON). For this purpose, all the applications in PyBossa now feature...
-
Last Saturday, the 26th of January, Mozilla held in parallel in 25 cities all over the world a hack day, the #FirefoxOSAppDay, about creating new web applications for their new FirefoxOS mobile OS and the desktop web browser (this stills in beta and alpha mode!)....
-
In the last weeks we have been working hard in order to make easier to develop new PyBossa applications. For this reason, we are happy to announce a new version of PyBossa.JS. This new version introduces several improvements: Creating an app is much easier! You...
-
At the Open Interests hackday in November, a discussion with Martin Stabe from the FT's interactive desk led a prototype of Journoid. The idea is to monitor changing on-line datasets for remarkable information, like earthquakes, procurement in a particular industry or a close parliamentary vote....
-
I've traditionally used python for web scraping but I'd been increasingly thinking about using Node given that it is pure JS and therefore could be a more natural fit when getting info out of web pages. In particular, when my first steps when looking to...
-
There are many circumstances where you want to archive a tweets - maybe just from your own account or perhaps for a hashtag for an event or topic. Unfortunately Twitter search queries do not give data more than 7 days old and for a given...
-
If you compare software code and legislation you can find many similarities: both are big bodies of text spread over multiple units (laws/files). The total amount of text inevitably grows bigger over time with many small changes to existing parts while most of the corpus...
-
Thanks to the free crowd-crafting tool PyBossa, nowadays the biggest challenge for successful crowd-sourcing is engaging users for participating in tasks, and to keep that motivation at a high level over time. Therefor the user experience of crowd-sourcing apps plays a crucial role. After participating...
-
This post is a rough and ready overview of various javascript timeline libraries that arose from research in creating a timeline view for Recline JS. Note this material hung around on my hard disk for a few months so some of it may already be...
-
Making sense of massive datasets that document the processes of lobbying and public procurement at European Union level is not an easy task. Yet a group of 25 journalists, developers, graphic designers and activists worked together at the Open Interests Europe hackathon last weekend...
-
How much does the highest paid person in the Brazilian Federal Senate earns? That's the question I asked myself a few weeks ago, and one that should be easy to answer. In Brazil, every public body must publish its employees' salaries online, but some do...
-
We've recently finished a demo for ReclineJS showing how it can be used to build JS-based (ajax-style) search interfaces in minutes (or even seconds!): http://reclinejs.com/demos/search/ Because of Recline's pluggable backends you get out of the box support for data sources such as SOLR, Google Spreadsheet,...
-
We're having the next Show and Tell on Friday, 26 October at 2:30 pm BST via Google Hangout on Air. As usual, the URL will be posted on OKFN Labs' G+ Page. If you'd like to present, add your name to the list. Remember, #okfn...
-
One of the largest data collection projects we have done so far has been the consolidation of the UK's departmental expenditure. Over 370 different government entities have published a total of more than 7000 spreadsheets. Many of those have obviously been hand-crafted or at least...
-
The European Journalism Centre and the Open Knowledge Foundation, sponsored by Knight-Mozilla OpenNews, invite you to the Open Interests Hackathon to track the the interests and money flows which shape European policy. When: 24-25 November Where: Google Campus Cafe, 4-5 Bonhill Street, EC2A 4BX London...
-
Built an app or tool you want to show people? Played around with some interesting data? Know of a new development people should know about? Want to find out what others are doing? Come to the Show and Tell this Friday and share what you...
-
Last week, Matej Kurian published a message on the okfn-labs mailing list, describing the various sources he had discovered for machine-readable excerpts of the EU's joint procurement system, TED. What struck me about this message was that, apparently, this polite and brilliant policy wonk had...
-
WikipediaJS is a simple JS library for accessing information in Wikipedia articles such as dates, places, abstracts etc. The library is the work of Labs member Rufus Pollock. In essence, it is a small wrapper around the data and APIs of the DBPedia...
-
As part of the Recline launch I put together quickly some very simple demo apps one of which was called Timeliner: http://timeliner.reclinejs.com/ This uses the Recline timeline component (which itself is a relatively thin wrapper around the excellent Verite timeline) plus the Recline Google docs...
-
This a brief post to announce an alpha prototype version of the Data Transformer, an app to let you clean up data in the browser using javascript: http://transformer.datahub.io/ 2m overview video: What does this app do? You load a CSV file from github (fixed...
-
Labs member Daniel Lombraña González has built a 3-d globe showing the locatoins of urban parks around the world as located by volunteers using the Pybossa Urban Park geocoding app: http://teleyinex.github.com/pybossa-urbanpark-globe/ — (Source code) Background The Urban Parks geo-coding application is a micro-tasking app running...
-
On June 21st, the Knight News Challenge Round on Data ended. The day before, Rufus, Ross and I sat down to write out some ideas that we'd been discussing for a while. While we submitted proposals for Grano and DataProtocols, we decided to hold back...
-
On June 21st, the Knight News Challenge Round on Data ended. The day before, Rufus, Ross and I sat down to write out some ideas that we'd been discussing for a while. The first idea I want to repost here is a proposal for Grano,...
Have your say!
Do you have a topic that you'd like to write about? We love guest posts, just contact us at: labs (at) okfn.org.
Blogroll
Some places we find inspiration (where we steal ideas):
- Code for America Labs
- World Bank Dataviz and Mapping for Results
- ProPublica Nerd Blog
- Government Digital Service
- Tactical Technology: Reveal
- DevelopmentSeed
- vis4.net
- DataDrivenJournalism.net
- Open Institute - Kenya
- IRE: Behind the Story
- Sunlight Labs Blog and Sunlight Reporting
- Organized Crime and Corruption Reporting Project
- ScraperWiki Data Blog











