<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
 
 <title>OKFN Labs: The Data Wrangling Blog</title>
 <link href="http://okfnlabs.org/blog/feed.xml" rel="self"/>
 <link href="http://okfnlabs.org/"/>
 <updated>2013-06-10T12:27:23-07:00</updated>
 <id>http://okfnlabs.org/</id>
 <author>
   <name>The Open Knowledge Foundation</name>
 </author>

 
 <entry>
   <title>data.okfn.org - update no. 1</title>
   <link href="http://okfnlabs.org/blog/2013/05/28/data-okfn-org-update-no-1.html"/>
   <updated>2013-05-28T00:00:00-07:00</updated>
   <id>http://okfnlabs.org/blog/2013/05/28/data-okfn-org-update-no-1</id>
   <content type="html">&lt;p&gt;This is the first of regular updates on Labs project &lt;a href=&quot;http://data.okfn.org/&quot;&gt;http://data.okfn.org/&lt;/a&gt;
and summarizes some of the changes and improvements over the last few weeks.&lt;/p&gt;

&lt;h3&gt;1. Refactor of site layout and focus.&lt;/h3&gt;

&lt;p&gt;We've done a refactor of the site to have stronger focus on the data. Front page tagline is now:&lt;/p&gt;

&lt;blockquote&gt;&lt;p&gt;We're providing key datasets in high quality, easy-to-use and open form&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;Tools and standards are there in a clear supporting role. Thanks to all the suggestions and feedback on this and welcome more - we're still iterating.&lt;/p&gt;

&lt;h3&gt;2. Pull request data workflow&lt;/h3&gt;

&lt;p&gt;There was a nice example of the pull request data workflow being used (by a complete stranger!): &lt;a href=&quot;https://github.com/datasets/house-prices-uk/pull/1&quot;&gt;https://github.com/datasets/house-prices-uk/pull/1&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;3. New datasets&lt;/h3&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;US house prices &lt;a href=&quot;http://data.okfn.org/data/house-prices-us&quot;&gt;http://data.okfn.org/data/house-prices-us&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Annual consumer price index &lt;a href=&quot;http://data.okfn.org/data/cpi&quot;&gt;http://data.okfn.org/data/cpi&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;Looking to contribute data check out the instructions &lt;a href=&quot;http://data.okfn.org/about/contribute#data&quot;&gt;http://data.okfn.org/about/contribute#data&lt;/a&gt; and the outstanding requests: &lt;a href=&quot;https://github.com/datasets/registry/issues&quot;&gt;https://github.com/datasets/registry/issues&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;4. Tooling&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;We have a DataPackage.JSON creator tool in progress at http://data.okfn.org/tools/dp/create (&lt;a href=&quot;https://github.com/okfn/data.okfn.org/issues/28&quot;&gt;here's the relevant github issue&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;We have a new &lt;a href=&quot;http://data.okfn.org/tools&quot;&gt;data package viewer created by James Smith&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;h3&gt;5. Feedback on standards&lt;/h3&gt;

&lt;p&gt;There's been a lot of valuable feedback on the &lt;a href=&quot;http://data.okfn.org/standards&quot;&gt;data package and json table schema standards&lt;/a&gt; including some quite major suggestions (e.g. substantial change to JSON Table Schema to align more closely with JSON Schema - thx to jpmckinney)&lt;/p&gt;

&lt;h3&gt;Next steps&lt;/h3&gt;

&lt;p&gt;There's plenty more coming up soon in terms of data and the site and tools.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Complete the &lt;a href=&quot;https://github.com/okfn/data.okfn.org/issues/28&quot;&gt;datapackage.json generator&lt;/a&gt; (support for gdocs especially)&lt;/li&gt;
&lt;li&gt;Complete the &lt;a href=&quot;https://github.com/okfn/data.okfn.org/issues/27&quot;&gt;datapackage.json validator&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;More &lt;a href=&quot;http://data.okfn.org/about/contribute#data&quot;&gt;datasets especially key indices&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;h3&gt;Get Involved&lt;/h3&gt;

&lt;p&gt;Anyone can contribute and its easy -- if you can use a spreadsheet you can help!&lt;/p&gt;

&lt;p&gt;Instructions for getting involved here: &lt;a href=&quot;http://data.okfn.org/about/contribute&quot;&gt;http://data.okfn.org/about/contribute&lt;/a&gt;&lt;/p&gt;
</content>
   <author>
     <name>Rufus Pollock</name>
   </author>
 </entry>
 
 <entry>
   <title>Open Humanities Hangout - Open Correspondence and the Letter Net</title>
   <link href="http://okfnlabs.org/blog/2013/05/21/humanities-hangout-open-correspondence-letter-net.html"/>
   <updated>2013-05-21T00:00:00-07:00</updated>
   <id>http://okfnlabs.org/blog/2013/05/21/humanities-hangout-open-correspondence-letter-net</id>
   <content type="html">&lt;p&gt;Our next Open Humanities Hangout will take place next &lt;strong&gt;Tuesday, 28th May&lt;/strong&gt;. This is the latest in the series of regular hangouts we've been organizing over the past few months with people interested in tapping in to the growing amount of &lt;strong&gt;open cultural data and content&lt;/strong&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;What:&lt;/strong&gt; Open Humanities Hangout looking at opening up historical correspondence and mapping the &quot;letter net&quot; &amp;ndash; e.g. did Dickens write to George Eliot and did she write back? Come help us find out! &lt;a href=&quot;#more&quot;&gt;Read more below&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;When:&lt;/strong&gt; Tuesday 28th May 2013 at 1700 BST, 12:00 EDT, 1800 CET&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Where:&lt;/strong&gt; Online via Google Hangout and &lt;a href=&quot;/contact&quot;&gt;IRC&lt;/a&gt; &amp;ndash; we'll publish the hangout url nearer the time&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Who:&lt;/strong&gt; anyone who loves the humanities and wants to see the great works of our past accessible and re-usable by anyone regardless of background or location.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Signup:&lt;/strong&gt; please &lt;a href=&quot;https://docs.google.com/a/okfn.org/document/d/1WIzi7n3D5_c7QtaGKQAFbm7bMGmi-u_vjmI5NX8MWJA/edit#&quot;&gt;sign up here&lt;/a&gt; or email sam.leon@okfn.org. Note you can always just drop in on the day but it helps us if we have a sense of numbers!&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;About the Hangouts&lt;/h2&gt;

&lt;p&gt;The &lt;a href=&quot;/events/hangouts/&quot;&gt;Humanities Hangouts&lt;/a&gt; are an informal virtual get together to build apps and insights using open cultural material. Among other things participants have put together an app that helps you to get to know Shakespeare better called &lt;a href=&quot;http://crowdcrafting.org/app/bardomatic/&quot;&gt;Bardomatic&lt;/a&gt;, hacked on an annotation tool for public domain texts called &lt;a href=&quot;http://textusproject.org&quot;&gt;TEXTUS&lt;/a&gt; and created interactive &lt;a href=&quot;http://timeliner.okfnlabs.org/&quot;&gt;timelines of the great Western medieval philosophers&lt;/a&gt; (helping to improve and de-bug the &lt;a href=&quot;http://timeliner.okfnlabs.org/&quot;&gt;Timeliner tool&lt;/a&gt; in the process).&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;http://farm6.staticflickr.com/5323/8768093210_3343870b2a_c.jpg&quot; alt=&quot;Screen Shot 2013-05-16 at 13.26.10&quot; width=&quot;1272&quot; height=&quot;768&quot; class=&quot;aligncenter size-full wp-image-2185&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;more&quot;&gt;The Challenge: Mapping Networks of Correspondence&lt;/h2&gt;


&lt;p&gt;We want to construct a workflow that will enable &lt;em&gt;anyone&lt;/em&gt; to take a published set of letters and turn it into open data and content that we can explorer and visualize. Ultimately we want the network of correspondence &amp;ndash; the &quot;letter net&quot;.&lt;/p&gt;

&lt;h3&gt;Suggested process&lt;/h3&gt;

&lt;p&gt;This is something to discusson the hangout, but we think the effort will involve at least 3 steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Locate published collection of letters

&lt;ul&gt;
&lt;li&gt;Great if these are already digitized on gutenberg&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Extract structured data like author, recipient, date, location

&lt;ul&gt;
&lt;li&gt;Geo-code all those locations&lt;/li&gt;
&lt;li&gt;If the texts are not digitized start thinking about that!&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Visualise the results&lt;/li&gt;
&lt;/ol&gt;


&lt;p&gt;We've already done work on steps 1 and even 2 in the &lt;a href=&quot;https://github.com/okfn/openletters&quot;&gt;case of Dickens&lt;/a&gt;. For geocoding there's already a simple &lt;a href=&quot;http://schoolofdata.org/2013/02/19/geocoding-part-i-introduction-to-geocoding/&quot;&gt;geocoding guide on the School of Data&lt;/a&gt;. For visualization there are plenty of options that we'll explore on the hangout. (And if we want to start scanning and OCRing there are &lt;a href=&quot;http://www.diybookscanner.org/&quot;&gt;manuals on how to build your own scanner&lt;/a&gt;).&lt;/p&gt;

&lt;h3&gt;Our Goal&lt;/h3&gt;

&lt;p&gt;Our basic goal is a set of beautiful and insightful set of visualisations about the correspondence of key cultural figures.&lt;/p&gt;

&lt;p&gt;Longer term we would love to see a database of correspondence that is open to everyone to use and add to.&lt;/p&gt;
</content>
   <author>
     <name>Sam Leon</name>
   </author>
 </entry>
 
 <entry>
   <title>Nomenklatura - Data Matching and Reconciliation Made Easy</title>
   <link href="http://okfnlabs.org/blog/2013/05/16/nomenklatura-matching-service-reconciliation-made-easy.html"/>
   <updated>2013-05-16T00:00:00-07:00</updated>
   <id>http://okfnlabs.org/blog/2013/05/16/nomenklatura-matching-service-reconciliation-made-easy</id>
   <content type="html">&lt;p&gt;&lt;a href=&quot;http://nomenklatura.okfnlabs.org/&quot;&gt;Nomenklatura&lt;/a&gt; is a simple service that makes it easy to maintain a canonical list of entities such as persons, companies or event streets and to match messy input, such as their names against that canonical list &amp;ndash; for example, matching Acme Widgets, Acme Widgets Inc and Acme Widgets Incorporated to the canonical &quot;Acme Widgets&quot;.&lt;/p&gt;

&lt;p&gt;With Nomenklatura its a matters of minutes to set up your own set of master data to match against and it provides a simple user interface and &lt;a href=&quot;http://nomenklatura.okfnlabs.org/about&quot;&gt;API&lt;/a&gt; which you can then use do matching (the API is compatible with Open Refine's reconciliation function).&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;http://nomenklatura.okfnlabs.org/&quot;&gt;Nomenklatura&lt;/a&gt; can not only store the master set of entities you want to match against but also will learn and record the various aliases for a given entity - such as a person, organisation or place - may have in various datasets.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;http://nomenklatura.okfnlabs.org/&quot;&gt;&lt;img src=&quot;http://i.imgur.com/h9411NU.jpg&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As such Nomenklatura chooses a design half way between an entity database (such as OpenCorporates, PopIt or similar services) and a automated de-duplication software (such as dedupe or SILK).&lt;/p&gt;

&lt;p&gt;Nomenklatura has been battle-tested with real-world usage, for example to de-duplicate the names of &lt;a href=&quot;http://nomenklatura.okfnlabs.org/offenesparlament&quot;&gt;German parliamentarians&lt;/a&gt;, &lt;a href=&quot;http://nomenklatura.okfnlabs.org/uk25k-departments&quot;&gt;UK government departments&lt;/a&gt; and &lt;a href=&quot;http://nomenklatura.okfnlabs.org/openinterests-entities&quot;&gt;spending data schemas and EU lobbyists&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Typically, a data extraction process will check all the entity names it discovers in the source data against nomenklaturas API. If Nomenklatura does not recognize a name, a new alias record is stored as a placeholder. This alias can then be matched to an entity by the user through a simple-to-use reconciliation user interface.&lt;/p&gt;

&lt;p&gt;To kickstart such a process, data can be uploaded via CSV - but new entities can be created dynamically as well. The advantage of a manual approach is that it minimizes the risk of false matches -- this level of quality assurance can be crucial, if, for example, the output will be displayed in an application that is intended to hold government to account.&lt;/p&gt;

&lt;h2&gt;This Release&lt;/h2&gt;

&lt;p&gt;This latest release of Nomenklatura includes a number of important changes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;The domain model was refactored to use a clearer naming scheme, canonical values are now called &quot;entities&quot;, and their alternative spellings are now &quot;aliases&quot;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;CSV upload support allows users to submit a list of entities, aliases or fully executed mappings.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Support for the Open Refine API was added, so that each Nomenklatura dataset can be added as a reconciliation service and used to clean data from inside Refine.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Keyboard shortcuts were added to the reconciliation tool, so that matches can be identified without using a mouse - a fast user can now match a few hundred records an hour.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The Python client library has been refactored and submitted to PyPi, it can be installed via &quot;pip install pynomenklatura&quot;.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;Credits and Links&lt;/h2&gt;

&lt;p&gt;Nomenklatura was developed by &lt;a href=&quot;/members/pudo/&quot;&gt;Labs Member Friedrich Lindenberg&lt;/a&gt; with contributions from other folks including fellow Labs members Michael Bauer.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/pudo/nomenklatura&quot;&gt;Nomenklatura source code on GitHub&lt;/a&gt;&lt;/p&gt;
</content>
   <author>
     <name>Friedrich Lindenberg</name>
   </author>
 </entry>
 
 <entry>
   <title>Update on PublicBodies.org - a URL for every part of Government</title>
   <link href="http://okfnlabs.org/blog/2013/05/01/publicbodies.org-an-update.html"/>
   <updated>2013-05-01T00:00:00-07:00</updated>
   <id>http://okfnlabs.org/blog/2013/05/01/publicbodies.org-an-update</id>
   <content type="html">&lt;p&gt;This is an update on &lt;a href=&quot;http://publicbodies.org/&quot;&gt;PublicBodies.org&lt;/a&gt; - a Labs project whose aim is to provide a &quot;URL for every part of Government&quot;: &lt;a href=&quot;http://publicbodies.org/&quot;&gt;http://publicbodies.org/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;PublicBodies.org is a database and website of &quot;Public Bodies&quot; &amp;ndash; that is Government-run or controlled organizations (which may or may not have distinct corporate existence). Examples would include government ministries or departments, state-run organizations such as libraries, police and fire departments and more.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;http://publicbodies.org/&quot;&gt;&lt;img src=&quot;http://i.imgur.com/2AbIjSu.png&quot; alt=&quot;&quot; style=&quot;margin-top: 15px; margin-bottom: 15px;&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We run into public bodies all the time in projects like OpenSpending (either as spenders or recipients). Back in 2011 as part of the &quot;Organizations&quot; data workshop at OGD Camp 2011, Labs member Friedrich Lindenberg scraped together a first database and site of &quot;public bodies&quot; from various sources (primarily FoI sites like WhatDoTheyKnow, FragDenStaat and AskTheEU).&lt;/p&gt;

&lt;p&gt;We've recently redone the site converting the sqlite DB to simple flat CSV files:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Main github repo: &lt;a href=&quot;https://github.com/okfn/publicbodies&quot;&gt;https://github.com/okfn/publicbodies&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Example raw CSV: &lt;a href=&quot;https://raw.github.com/okfn/publicbodies/master/data/gb.csv&quot;&gt;https://raw.github.com/okfn/publicbodies/master/data/gb.csv&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;The site itself is now super-simple flat-files hosted on s3 (&lt;a href=&quot;https://github.com/okfn/publicbodies/tree/master/site&quot;&gt;build code here&lt;/a&gt;). Here's an example of the output:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;European Parliament: &lt;a href=&quot;http://publicbodies.org/eu/european-parliament.html&quot;&gt;http://publicbodies.org/eu/european-parliament.html&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Associated JSON API (with CORS!) &lt;a href=&quot;http://publicbodies.org/eu/european-parliament.json&quot;&gt;http://publicbodies.org/eu/european-parliament.json&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;The simplicity of CSV for data plus simple templating to flat-files is very attractive. There are some drawbacks such as changes to primary template resulting in a full rebuild and upload of ~6k files so, especially as the data grows, we may want to look into something a bit nicer but for the time being this works well.&lt;/p&gt;

&lt;h2&gt;Next Steps&lt;/h2&gt;

&lt;p&gt;There's plenty that could be improved e.g.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;More data - other jurisdictions (we only cover EU, UK and Germany) + descriptions for the bodies (this could be a nice crowdcrafting app)&lt;/li&gt;
&lt;li&gt;Search and Reconciliation (via nomenklatura)&lt;/li&gt;
&lt;li&gt;Making it easier to submit corrections or additions&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;The full list of issues is on github here: &lt;a href=&quot;https://github.com/okfn/publicbodies/issues&quot;&gt;https://github.com/okfn/publicbodies/issues&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Help is most definitely wanted! Just grab one of the issues or &lt;a href=&quot;http://okfnlabs.org/contact/&quot;&gt;get in touch&lt;/a&gt; ...&lt;/p&gt;
</content>
   <author>
     <name>Rufus Pollock</name>
   </author>
 </entry>
 
 <entry>
   <title>Quick and Dirty Analysis on Large CSVs</title>
   <link href="http://okfnlabs.org/blog/2013/04/11/quick-and-dirty-analysis-on-large-csv.html"/>
   <updated>2013-04-11T00:00:00-07:00</updated>
   <id>http://okfnlabs.org/blog/2013/04/11/quick-and-dirty-analysis-on-large-csv</id>
   <content type="html">&lt;p&gt;I'm playing around with some large(ish) CSV files as part of a &lt;a href=&quot;http://openspending.org/&quot;&gt;OpenSpending&lt;/a&gt; related data investigation to look at UK government spending last year -- example question: which companies were the top 10 recipients of government money? (More details can be
found in &lt;a href=&quot;https://github.com/openspending/thingstodo/issues/5&gt;&quot;&gt;this issue on OpenSpending's things-to-do repo&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;The dataset I'm working with is the consolidated spending (over £25k) by all UK goverment departments. Thanks to the efforts of of OpenSpending folks (and specifically Friedrich Lindenberg) this data is already nicely ETL'd from thousands of individual CSV (and xls) files into one big 3.7 Gb file (see below for links and details).&lt;/p&gt;

&lt;p&gt;My question is what is the best way to do quick and dirty analysis on this?&lt;/p&gt;

&lt;p&gt;Examples of the kinds of options I was considering were:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Simple scripting (python, perl etc)&lt;/li&gt;
&lt;li&gt;Postgresql - load, build indexes and then sum, avg etc&lt;/li&gt;
&lt;li&gt;Elastic MapReduce (AWS Hadoop)&lt;/li&gt;
&lt;li&gt;Google BigQuery&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;Love to hear what folks think and if there are tools or approaches they would specifically recommend.&lt;/p&gt;

&lt;h3&gt;The Data&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Here's the &lt;a href=&quot;http://data.etl.openspending.org/uk25k/spending-latest.csv&quot;&gt;3.7 Gb CSV&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;A &lt;a href=&quot;http://www.dataprotocols.org/en/latest/data-packages.html&quot;&gt;Data Package file&lt;/a&gt; for the data describing the fields: &lt;a href=&quot;https://raw.github.com/openspending/dpkg-uk25k/master/datapackage.json&quot;&gt;https://raw.github.com/openspending/dpkg-uk25k/master/datapackage.json&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</content>
   <author>
     <name>Rufus Pollock</name>
   </author>
 </entry>
 
 <entry>
   <title>sqlaload, an ETL wrapper for SQLAlchemy</title>
   <link href="http://okfnlabs.org/blog/2013/03/30/sqlaload.html"/>
   <updated>2013-03-30T00:00:00-07:00</updated>
   <id>http://okfnlabs.org/blog/2013/03/30/sqlaload</id>
   <content type="html">&lt;p&gt;&lt;a href=&quot;https://github.com/okfn/sqlaload&quot;&gt;sqlaload&lt;/a&gt; is a small library that I use to handle databases in Python data processing. In many projects, your process starts with very messy data (something you've scraped or loaded from a hand-prepared Excel sheet). In subsequent stages, you gradually add cleaned values in new columns or new tables. Managing a full SQL schema for such operations can be a hassle, you really want something close to &lt;a href=&quot;http://www.mongodb.org/&quot;&gt;MongoDB&lt;/a&gt;: a NoSQL data store you can throw fairly random data at and get it back later.&lt;/p&gt;

&lt;p&gt;With sqlaload, the idea is to combine some of the schema flexibility, while still keeping things in a structured database in the background:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;python&quot;&gt;&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;sqlaload&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;sl&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;engine&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sl&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;connect&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;#39;sqlite:///test.db&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;# add some data:  &lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;sl&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;add_row&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;engine&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;#39;mytable&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;#39;name&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;#39;Foo&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;#39;has_this&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;True&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;})&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;sl&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;add_row&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;engine&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;#39;mytable&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;#39;name&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;#39;Bar&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;#39;has_other&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;True&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;})&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;# Look up a record&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;row&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sl&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;find_one&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;engine&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;#39;mytable&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;#39;Foo&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;assert&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;row&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;#39;has_this&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;==&lt;/span&gt;&lt;span class=&quot;bp&quot;&gt;True&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;# Update a record:&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;sl&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;upsert&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;engine&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;#39;mytable&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;#39;name&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;#39;Foo&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;#39;location&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;#39;Atlantis&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;},&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;#39;name&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;])&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;# Or create one:&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;sl&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;upsert&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;engine&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;#39;mytable&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;#39;name&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;#39;Qux&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;#39;location&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;#39;Elsewhere&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;},&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;#39;name&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;I first saw this type of SQL schema generation implemented in &lt;a href=&quot;http://scraperwiki.com&quot;&gt;ScraperWiki&lt;/a&gt;: they have a couple of &lt;a href=&quot;https://scraperwiki.com/docs/python/python_help_documentation/&quot;&gt;high-level SQLite wrappers&lt;/a&gt; that expand your database as you feed them data. We later adopted that concept for the joint CKAN/ScraperWiki &lt;a href=&quot;https://github.com/okfn/webstore&quot;&gt;webstore&lt;/a&gt;, which neither project ended up using.&lt;/p&gt;

&lt;p&gt;Still, webstore had become an essential part of many of my data projects as an &lt;a href=&quot;http://en.wikipedia.org/wiki/Operational_data_store&quot;&gt;operational data store&lt;/a&gt;. Eventually, I decided to kick out the networking aspect: data access via HTTP is terribly slow and I wanted to have my data in Postgres, not SQLite. The webstore code went into sqlaload, and became a thin wrapper on top of &lt;a href=&quot;http://docs.sqlalchemy.org/en/rel_0_8/&quot;&gt;SQLAlchemy core&lt;/a&gt; (the non-ORM database abstraction part of SQLAlchemy).&lt;/p&gt;

&lt;p&gt;Running on top of SQLAlchemy also means that all of its functionality - for example the &lt;a href=&quot;http://docs.sqlalchemy.org/en/rel_0_8/core/expression_api.html&quot;&gt;query expression language&lt;/a&gt; - are available and can be used to call up more advanced functionality.&lt;/p&gt;

&lt;p&gt;If you want to try it out, sqlaload is now on &lt;a href=&quot;https://pypi.python.org/pypi/sqlaload&quot;&gt;PyPI&lt;/a&gt; and the &lt;a href=&quot;https://github.com/okfn/sqlaload/blob/master/README.md&quot;&gt;README&lt;/a&gt; has a lot of detailed documentation on the library.&lt;/p&gt;
</content>
   <author>
     <name>Friedrich Lindenberg</name>
   </author>
 </entry>
 
 <entry>
   <title>Next Steps for Textus</title>
   <link href="http://okfnlabs.org/blog/2013/03/27/next-steps-for-textus.html"/>
   <updated>2013-03-27T00:00:00-07:00</updated>
   <id>http://okfnlabs.org/blog/2013/03/27/next-steps-for-textus</id>
   <content type="html">&lt;p&gt;At the Culture Labs hangout yesterday we wrote up the plans for the next steps for Textus we have been discussing over the last few months.&lt;/p&gt;

&lt;p&gt;The result is this slide deck overview. It both introduces Textus and outlines next steps (slide 12 onwards).&lt;/p&gt;

&lt;iframe src=&quot;https://docs.google.com/presentation/d/1OlXIaGgntenmBLNMu0tZYTdrP09TvzZ-R5bpJAgznF4/embed?start=false&amp;loop=false&amp;delayms=3000&quot; frameborder=&quot;0&quot; width=&quot;580&quot; height=&quot;464&quot; allowfullscreen=&quot;true&quot; mozallowfullscreen=&quot;true&quot; webkitallowfullscreen=&quot;true&quot;&gt;&lt;/iframe&gt;


&lt;h2&gt;Key Points&lt;/h2&gt;

&lt;p&gt;We want to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Maximize simplicity&lt;/li&gt;
&lt;li&gt;Connect with a CMS (people always want other content than just the texts)&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;Implications are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Componentize &quot;Textus&quot; and separate text preparation / import from presentation&lt;/li&gt;
&lt;li&gt;Create a plugin to make &quot;Textus&quot; style functionality one-click install into Wordpress&lt;/li&gt;
&lt;li&gt;Eliminate dependencies on ElasticSearch &amp;amp; NodeJS (texts &amp;amp; markup stored in plain files online or in WP ...)&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;Specifically, we plan to break Textus into 3 components:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://github.com/CultureLabs/textus-formatter&quot;&gt;textus-formatter&lt;/a&gt; - nodejs app/command line tool for formatting texts&lt;/li&gt;
&lt;li&gt;textus-viewer - JS-only viewer&lt;/li&gt;
&lt;li&gt;textus-wordpress - wordpress integration&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;&lt;img src=&quot;https://docs.google.com/drawings/d/1S9Hv98LWdcfuG3KjF1qELsZBp-RQ08Ylo3gxaO6tyQg/pub?w=960&amp;amp;h=720&quot; alt=&quot;&quot; title=&quot;New Architecture&quot; /&gt;&lt;/p&gt;
</content>
   <author>
     <name>Sam Leon</name>
   </author>
 </entry>
 
 <entry>
   <title>Progress on the Data Explorer</title>
   <link href="http://okfnlabs.org/blog/2013/03/18/progress-with-data-explorer.html"/>
   <updated>2013-03-18T00:00:00-07:00</updated>
   <id>http://okfnlabs.org/blog/2013/03/18/progress-with-data-explorer</id>
   <content type="html">&lt;p&gt;This is an update on progress with the &lt;a href=&quot;http://explorer.okfnlabs.org/&quot;&gt;Data Explorer&lt;/a&gt; (aka Data Transformer).&lt;/p&gt;

&lt;p&gt;Progress is best seen from this &lt;a href=&quot;http://explorer.okfnlabs.org/#rgrp/e3e0b0f18dfe151f9f7e&quot;&gt;demo which takes you on a tour of house prices and the difference between real and nominal values&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;More information on recent developments can be found below. Feedback is &lt;em&gt;very welcome&lt;/em&gt; - either here or the issues &lt;a href=&quot;https://github.com/okfn/dataexplorer&quot;&gt;https://github.com/okfn/dataexplorer&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;http://explorer.okfnlabs.org/#rgrp/e3e0b0f18dfe151f9f7e&quot;&gt;&lt;img src=&quot;http://i.imgur.com/WeDO0vK.png&quot; alt=&quot;House prices tutorial&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;What is the Data Explorer&lt;/h2&gt;

&lt;p&gt;For those not familiar, the &lt;a href=&quot;http://explorer.okfnlabs.org/&quot;&gt;Data Explorer is a HTML+JS app&lt;/a&gt; to view, visualize and process data &lt;em&gt;just in the browser&lt;/em&gt; (no backend!). It draws heavily on the &lt;a href=&quot;http://okfnlabs.org/recline/&quot;&gt;Recline library&lt;/a&gt; and features now include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Importing data from various sources (the UX of this could be much improved!)&lt;/li&gt;
&lt;li&gt;Viewing and visualizing using Recline to create grids, graphs and maps&lt;/li&gt;
&lt;li&gt;Cleaning and transforming data using a scripting component that allows you to write and run javascript&lt;/li&gt;
&lt;li&gt;Saving and sharing: everything you create (scripts, graphs etc) can be saved and then shared via public URL.&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;Note, that persistence (for sharing) is to Gists (here's the &lt;a href=&quot;https://gist.github.com/rgrp/e3e0b0f18dfe151f9f7e&quot;&gt;gist for the House Prices demo linked above&lt;/a&gt;). This has some nice benefits such as versioning; offline editing (clone the gist, edit and push); and bl.ocks.org-style ability to create a gist and have it result in public viewable output (though with substantial differences vs blocks ...).&lt;/p&gt;

&lt;h2&gt;What's Next&lt;/h2&gt;

&lt;p&gt;There are many areas that could be worked on -- a full list of &lt;a href=&quot;https://github.com/okfn/dataexplorer/issues&quot;&gt;issues is in github&lt;/a&gt;. The most important I think at the moment are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://github.com/okfn/dataexplorer/issues/88&quot;&gt;Storing the data &quot;locally&quot; in the data project&lt;/a&gt;. At present, data is always loaded from an &quot;external&quot; source. This probably involves extending the current Recline datastore to back on to IndexedDB.&lt;/li&gt;
&lt;li&gt;A &lt;a href=&quot;https://github.com/okfn/dataexplorer/issues/60&quot;&gt;better project creation &amp;amp; data import process&lt;/a&gt; - I think we could learn a lot from Refine here&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://github.com/okfn/dataexplorer/issues/84&quot;&gt;&quot;Fork&quot; support&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;More &lt;a href=&quot;https://github.com/okfn/dataexplorer/issues/52&quot;&gt;documentation and tutorials especially for scripting&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Getting rid of the many rough edges especially on the UX side of things!&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;I'd very interested in people's thoughts on the app so far and what should be done next and code contributions are also very welcome (the app has already benefitted from the efforts of many people including the likes of &lt;a href=&quot;http://mk.ucant.org/&quot;&gt;Martin Keegan&lt;/a&gt; and &lt;a href=&quot;https://github.com/michael&quot;&gt;Michael Aufreiter&lt;/a&gt; to the app itself; and from folks like &lt;a href=&quot;http://maxogden.com/&quot;&gt;Max Ogden&lt;/a&gt;, &lt;a href=&quot;http://pudo.org/&quot;&gt;Friedrich Lindenberg&lt;/a&gt;, &lt;a href=&quot;http://casbon.me/&quot;&gt;James Casbon&lt;/a&gt;, &lt;a href=&quot;http://driven-by-data.net/&quot;&gt;Gregor Aisch&lt;/a&gt;, &lt;a href=&quot;http://nigelb.me/&quot;&gt;Nigel Babu&lt;/a&gt; (and many more) in the form of ideas, feedback, work on Recline etc).&lt;/p&gt;
</content>
   <author>
     <name>Rufus Pollock</name>
   </author>
 </entry>
 
 <entry>
   <title>Recline JS - Componentization and a Smaller Core</title>
   <link href="http://okfnlabs.org/blog/2013/02/26/recline-js-componentization-and-a-smaller-core.html"/>
   <updated>2013-02-26T00:00:00-08:00</updated>
   <id>http://okfnlabs.org/blog/2013/02/26/recline-js-componentization-and-a-smaller-core</id>
   <content type="html">&lt;p&gt;Over time &lt;a href=&quot;http://okfnlabs.org/recline/&quot;&gt;Recline JS&lt;/a&gt; has grown. In particular, since the first &lt;a href=&quot;http://blog.okfn.org/2012/07/05/announcing-recline-js-a-javascript-library-for-building-data-applications-in-the-browser/&quot;&gt;public
announce of Recline&lt;/a&gt; last summer we've had several people producing
new backends and views (e.g.  &lt;a href=&quot;https://github.com/okfn/recline/wiki/Extensions#list-of-extensions&quot;&gt;backends for Couch, a view for d3, a map view
based on Ordnance Survey's tiles etc etc&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;As &lt;a href=&quot;http://lists.okfn.org/pipermail/okfn-labs/2013-February/000638.html&quot;&gt;I wrote to the labs list recently&lt;/a&gt;, continually adding these to
core Recline runs the risk of bloat. Instead, we think it's better to keep the
core lean and move more of these &quot;extensions&quot; out of core with a clear listing
and curation process - the design of Recline means that &lt;a href=&quot;http://okfnlabs.org/recline/docs/backends.html&quot;&gt;new backends&lt;/a&gt; and
&lt;a href=&quot;http://okfnlabs.org/recline/docs/views.html&quot;&gt;views&lt;/a&gt; can extend the core easily and without any complex dependencies.&lt;/p&gt;

&lt;p&gt;This approach is useful in other ways. For example, Recline backends are
designed to support standalone use as well as use with Recline core (they have
no dependency on &lt;em&gt;any&lt;/em&gt; other part of Recline - &lt;em&gt;including core&lt;/em&gt;) but this is
not very obvious as it stands (where the backend is bundled with Recline). To
take a concrete example, the Google Docs backend is a useful wrapper for the
Google Spreadsheets API in its own right. While this is already true, when this
code is in the main Recline repository it isn't very obvious but having the
repo split out with its own README would make this much clearer.&lt;/p&gt;

&lt;h2&gt;So the plan is ...&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Announce this approach of a leaner core and more &quot;Extensions&quot;

&lt;ul&gt;
&lt;li&gt;Link to the specifications for &lt;a href=&quot;http://okfnlabs.org/recline/docs/backends.html&quot;&gt;Backends&lt;/a&gt; and &lt;a href=&quot;http://okfnlabs.org/recline/docs/views.html&quot;&gt;Views&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Create an official &lt;a href=&quot;https://github.com/okfn/recline/wiki/Extensions#list-of-extensions&quot;&gt;Recline Extensions page&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Identify first items to split out from core - see &lt;a href=&quot;https://github.com/okfn/recline/issues/314&quot;&gt;this issue&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Identify what components &lt;em&gt;should&lt;/em&gt; remain in core? (I'm thinking Dataset +
Memory DataStore plus one Grid, Graph and Map)&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;So far I've already started the process of factoring out some backends (and
soon views) into standalone repos, e.g. here's GDocs:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/okfn/recline.backend.gdocs&quot;&gt;https://github.com/okfn/recline.backend.gdocs&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Any thoughts very welcome and if you already have Recline extensions lurking in
your repos please add them to the &lt;a href=&quot;https://github.com/okfn/recline/wiki/Extensions#list-of-extensions&quot;&gt;wiki page&lt;/a&gt;&lt;/p&gt;
</content>
   <author>
     <name>Rufus Pollock</name>
   </author>
 </entry>
 
 <entry>
   <title>Exporting PyBossa data to CSV or JSON with one click</title>
   <link href="http://okfnlabs.org/blog/2013/02/20/exporting-pybossa-data-to-csv-with-one-click.html"/>
   <updated>2013-02-20T00:00:00-08:00</updated>
   <id>http://okfnlabs.org/blog/2013/02/20/exporting-pybossa-data-to-csv-with-one-click</id>
   <content type="html">&lt;p&gt;I'm really happy to announce that today we have finally added a feature that
will allow to &lt;a href=&quot;http://docs.pybossa.com/en/latest/user/tutorial.html#exporting-the-obtained-results&quot;&gt;export your data&lt;/a&gt; into a CSV format with just one click
(we also support the same for JSON).&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;http://i.imgur.com/zqPkMST.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;For this purpose, all the applications in PyBossa now feature a new URI:&lt;/p&gt;

&lt;blockquote&gt;&lt;p&gt;http://PYBOSSA-SERVER/app/slug/export&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;Where you will find several options to export the tasks or task runs (the answers)
to different formats. In the case of the CSV format, you will get a CSV file
that could be downloaded to your computer to load it later in any spreadsheet
software :-)&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;http://i.imgur.com/zVZCYW8.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;NOTE&lt;/strong&gt;: bear in mind that CSV is a flat format, so nested JSON objects will
be &quot;dumped&quot; as they are, so for example if you are using GeoJSON for storing
some location, you will get in the CSV file the JSON object as a string.
You can see &lt;a href=&quot;http://crowdcrafting.org/app/urbanpark/export?type=task&amp;amp;format=csv&quot;&gt;an example of this issue in the Urban Parks application&lt;/a&gt;, as this
demo application uses the &lt;a href=&quot;http://www.geojson.org/&quot;&gt;GeoJSON&lt;/a&gt; format for storing the location of the parks.&lt;/p&gt;

&lt;p&gt;If you prefer JSON, just click in any of the buttons and save the generated file!&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;http://i.imgur.com/vBDWLeb.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;If you want to try the new feature, just go ahead and check it in &lt;a href=&quot;http://crowdcrafting.org&quot;&gt;CrowdCrafting.org&lt;/a&gt;&lt;/p&gt;
</content>
   <author>
     <name>Daniel Lombraña González</name>
   </author>
 </entry>
 
 <entry>
   <title>Mozilla FirefoxOS App Days &amp; Crowdcrafting.org</title>
   <link href="http://okfnlabs.org/blog/2013/01/29/firefoxappday.html"/>
   <updated>2013-01-29T00:00:00-08:00</updated>
   <id>http://okfnlabs.org/blog/2013/01/29/firefoxappday</id>
   <content type="html">&lt;p&gt;&lt;img class=&quot;pull-left&quot; src=&quot;https://hacks.mozilla.org/wp-content/uploads/2012/12/firefoxOS-app-days_graphic_RGB.png&quot;/&gt;
Last Saturday, the 26th of January, &lt;a href=&quot;https://hacks.mozilla.org/2013/01/join-us-for-firefox-os-app-days/&quot;&gt;Mozilla held in parallel in 25 cities all over the world a hack day&lt;/a&gt;, the &lt;a href=&quot;https://twitter.com/search?q=%23firefoxosappdays&amp;amp;amp;src=tyah&quot;&gt;#FirefoxOSAppDay&lt;/a&gt;, about creating new web applications for their new &lt;a href=&quot;http://www.mozilla.org/en-US/firefoxos/&quot;&gt;FirefoxOS mobile OS&lt;/a&gt; and the desktop web browser (this stills in beta and alpha mode!).&lt;/p&gt;

&lt;p&gt;One of the events was held in Madrid, Spain, organized by the &lt;a href=&quot;http://www.mozilla-hispano.org/&quot;&gt;Mozilla Hispano Community&lt;/a&gt; so I had the chance to expend some time with the Mozilla community and play with the &lt;a href=&quot;https://developer.mozilla.org/en/docs/Mozilla/Firefox_OS&quot;&gt;new APIs and developer tools for their new platform&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;img class=&quot;pull-right&quot; src=&quot;http://mozorg.cdn.mozilla.net/media/img/firefoxos/firefox-phone.png&quot;/&gt;&lt;/p&gt;

&lt;p&gt;In the morning we attend several talks by several experts on the new APIs that
Mozilla are developing to integrate mobile actions like for example the &lt;a href=&quot;http://www.w3.org/TR/battery-status/&quot;&gt;battery
API&lt;/a&gt; that will allow you to check the
device battery status (right now integrated in the W3C standards) or the &lt;a href=&quot;https://wiki.mozilla.org/WebAPI/AlarmAPI&quot;&gt;Alarm
API&lt;/a&gt; that you can use to schedule a
notification, or for an application to be started, at a specific time.&lt;/p&gt;

&lt;p&gt;Mozilla is working really hard to standardize and integrate most of &lt;a href=&quot;https://wiki.mozilla.org/WebAPI&quot;&gt;these APIs&lt;/a&gt; into the W3C in order to make them available in any web browser. Some of the APIs are actually now accepted in the W3C as for example the &lt;a href=&quot;http://dvcs.w3.org/hg/dap/raw-file/tip/battery/Overview.html&quot;&gt;Battery Status API&lt;/a&gt;, &lt;a href=&quot;http://dvcs.w3.org/hg/dap/raw-file/tip/network-api/index.html&quot;&gt;Network Information API&lt;/a&gt;, &lt;a href=&quot;http://www.w3.org/TR/ambient-light/&quot;&gt;Ambient light sensor&lt;/a&gt; or the &lt;a href=&quot;http://www.w3.org/TR/2012/WD-proximity-20120712/&quot;&gt;Proximity sensor&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Mozilla also presented their efforts in making as easy as possible to create an
application from scratch re-using several &lt;a href=&quot;http://buildingfirefoxos.com/&quot;&gt;building-blocks&lt;/a&gt;
they have created for their new platform. Basically, they have created &lt;a href=&quot;http://buildingfirefoxos.com/&quot;&gt;a web page&lt;/a&gt; where you can copy and paste code snippets that you can later re-use in your own application,
keeping the look and feel of the platform.&lt;/p&gt;

&lt;p&gt;After the talks, all the participants had a better idea of what we could
develop with the platform: a web application that could use the hardware of the
new mobile phone devices, as well as Android phones out of the box!&lt;/p&gt;

&lt;p&gt;As the goal of the day was to create an app for the FirefoxOS, my idea was to create an application that could help to track when a new scientific application has been added to &lt;a href=&quot;http://crowdcrafting.org&quot;&gt;&lt;strong&gt;crowd&lt;/strong&gt;crafting&lt;/a&gt; so you could help doing some tasks in the new application.&lt;/p&gt;

&lt;p&gt;The web application basically lets you know which apps are new since the last time you check it out.&lt;/p&gt;

&lt;p&gt;The application works in any web browser (even Chrome) but if you want to feel how it will be in the new OS you can try it in your phone if you have an Android device. You will need to install the &lt;a href=&quot;http://nightly.mozilla.org/&quot;&gt;latest Firefox nightly&lt;/a&gt; (&lt;strong&gt;note: &lt;/strong&gt;&lt;em&gt;this is an experimental build, so it may crash in your phone!&lt;/em&gt;) and then type this URL:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;http://daniellombrana.es/crowdcrafting-app&quot;&gt;http://daniellombrana.es/crowdcrafting-app&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You will be able to install it in your phone and run it whenever you want directly from your home screen. If you don't want to install the browser, just open the link with a modern web browser and you should see it running (the install button will only work in &lt;a href=&quot;http://www.mozilla.org/en-US/firefox/channel/&quot;&gt;Firefox Beta&lt;/a&gt; and &lt;a href=&quot;http://www.mozilla.org/en-US/firefox/channel/#aurora&quot;&gt;Aurora builds&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;http://i.imgur.com/xjljFcc.png&quot; alt=&quot;FirefoxOS Crowdcrafting app&quot; /&gt;&lt;/p&gt;
</content>
   <author>
     <name>Daniel Lombraña González</name>
   </author>
 </entry>
 
 <entry>
   <title>PyBossa.JS or how you can easily create new PyBossa applications</title>
   <link href="http://okfnlabs.org/blog/2013/01/28/pybossa-js.html"/>
   <updated>2013-01-28T00:00:00-08:00</updated>
   <id>http://okfnlabs.org/blog/2013/01/28/pybossa-js</id>
   <content type="html">&lt;p&gt;In the last weeks we have been working hard in order to make easier to develop new PyBossa applications. For this reason, we are happy to announce a new version of PyBossa.JS. This new version introduces several improvements:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Creating an app is much easier!&lt;/strong&gt; You only have to override two methods: pybossa.taskLoaded and pybossa.presentTask to fit your app, and call pybossa.run('your-app-slug').&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Pre-loading tasks by default!&lt;/strong&gt; Now your app could improve its performance, as the next task for the user will be loaded in the background for you while the user stills solving the first one!&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Automatically update the task URL&lt;/strong&gt;. The library will change the browser's URL to the current task automatically, so using services like Disqus for comments is really simple (check the updated version of &lt;a href=&quot;http://crowdcrafting.org/app/flickrperson&quot;&gt;Flickr Person Finder&lt;/a&gt; for more details!).&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;As a result of this new version, there are at least two applications using the new PyBossa.JS version:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;http://crowdcrafting.org/app/flickrperson&quot;&gt;Flickr Person Finder&lt;/a&gt; has been updating, using this new set of features. If you try the application you will see that loading the next task (in this case an image which is usually 1024x1024px big) is almost instantly. Additionally, the app shows how you can use the Disqus service to allow your users to add comments for each task, but only loading them when the user wants.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;http://crowdcrafting.org/app/thefacewemake&quot;&gt;The Face We Make&lt;/a&gt; is a new application where you have to guess the emoticon that a person is representing in a photo. This app is a joint effort with the official &lt;a href=&quot;http://thefacewemake.org/about/&quot;&gt;The Face We Make&lt;/a&gt; project by &lt;a href=&quot;http://dxtr.com/&quot;&gt;Dexter Miranda&lt;/a&gt; and &lt;a href=&quot;http://daniellombrana.es&quot;&gt;Daniel Lombraña González&lt;/a&gt;. The app has been updated for using the new pre-loading of tasks, and once you have completed all of them (only 10 photos!) show you your results, in other words, how many of your guesses are right/wrong.&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;Finally, we have also added the &quot;missing features&quot; that allow you to create an application without using the API. Right now, you can create an application using only the web forms for creating the application:
&lt;a rel=&quot;lightbox&quot; title=&quot;Web form for creating an application&quot; href=&quot;/img/pybossa-create-app.png&quot;&gt;&lt;img src=&quot;/img/pybossa-create-app.png&quot; alt=&quot;Web form for creating an app&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can also add and work in the task presenter (we have included the &lt;a href=&quot;http://codemirror.net&quot;&gt;CodeMirror plugin&lt;/a&gt;, so you will see how it looks your code as you type it!):
&lt;a rel=&quot;lightbox&quot; title=&quot;Web form for editing the task presenter&quot; href=&quot;/img/pybossa-task-presenter-editor.png&quot;&gt;&lt;img src=&quot;/img/pybossa-task-presenter-editor.png&quot; alt=&quot;Web form for editing the task presenter&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As well as importing the tasks via a CSV file importer (you can even import the CSV file from a Google Spreadsheet!):&lt;/p&gt;

&lt;p&gt;&lt;a rel=&quot;lightbox&quot; title=&quot;Web form for importing tasks from a CSV file&quot; href=&quot;/img/pybossa-csv-import.png&quot;&gt;&lt;img src=&quot;/img/pybossa-csv-import.png&quot; alt=&quot;Web form for importing tasks from a CSV file&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The documentation has been updated in order to reflect this new features, and as a result you should be able to write an application really fast. However, we are far from perfect, so any feedback that you can give us will be really good! Thus, please, leave in the comments your feedback or send us an e-mail to info@pybossa.com. We will be more than happy to hear your thoughts on PyBossa!&lt;/p&gt;
</content>
   <author>
     <name>Daniel Lombraña González</name>
   </author>
 </entry>
 
 <entry>
   <title>Journoid, data notifications</title>
   <link href="http://okfnlabs.org/blog/2013/01/25/journoid.html"/>
   <updated>2013-01-25T00:00:00-08:00</updated>
   <id>http://okfnlabs.org/blog/2013/01/25/journoid</id>
   <content type="html">&lt;p&gt;At the &lt;a href=&quot;http://okfnlabs.org/events/hackdays/lobbying.html&quot;&gt;Open Interests&lt;/a&gt; hackday in November, a discussion with &lt;a href=&quot;http://www.martinstabe.com/&quot;&gt;Martin Stabe&lt;/a&gt; from the &lt;a href=&quot;www.ft.com/intl/interactive&quot;&gt;FT's interactive desk&lt;/a&gt; led a prototype of &lt;a href=&quot;https://github.com/pudo/journoid&quot;&gt;Journoid&lt;/a&gt;. The idea is to monitor changing on-line datasets for remarkable information, like &lt;a href=&quot;http://datadesk.latimes.com/&quot;&gt;earthquakes&lt;/a&gt;, procurement in a particular industry or a close parliamentary vote. While we'd discussed alerting in the context of &lt;a href=&quot;http://openspending.org/&quot;&gt;OpenSpending&lt;/a&gt; before, Martin had a pretty specific list of wishes that neither &lt;a href=&quot;http://pandaproject.net/&quot;&gt;PANDA&lt;/a&gt; nor &lt;a href=&quot;http://ifttt.com/&quot;&gt;IFTTT&lt;/a&gt; can handle:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Search not just for a single keyword or query, but compare the incoming data to a table of matches, such as a list of famous people, well-known companies or any other set of items that you may be interested in.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Use Google Docs for configuration. The FT uses Google Apps internally and it's an interface that their reporters already understand - just add a &quot;Config&quot; sheet to your keyword document, and store all relevant settings - like the source URL and recipient email - in there.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;The &lt;a href=&quot;https://github.com/pudo/journoid&quot;&gt;Journoid&lt;/a&gt; prototype from the hackday only fulfills the first of those requirements - and I'm still struggling with #2, as it's surprisingly hard to find a good Google Docs client library for Python.&lt;/p&gt;

&lt;p&gt;Still, the hack was a nice demo: sift through a &lt;a href=&quot;http://data.etl.openspending.org/uk25k/&quot;&gt;data dump from the UK departmental spending&lt;/a&gt;, check the supplier information against a list of companies of interest and finally send a message if there is a hit.&lt;/p&gt;

&lt;p&gt;As a further experiment, I was able to use &lt;a href=&quot;http://opencorporates.com/&quot;&gt;OpenCorporates&lt;/a&gt; to check the supplier's company status, answering a simple but interesting question: does the government do business with insolvent (or even dissolved) companies? It's interesting to think what other matches can be made when the comparison list is actually an API.&lt;/p&gt;

&lt;p&gt;What's next? It's time to clean up the &lt;a href=&quot;https://github.com/pudo/journoid/tree/master/journoid&quot;&gt;messy hackday code&lt;/a&gt;, to finish up GDocs configuration, some hosted solution and possibly a few other input formats.&lt;/p&gt;

&lt;p&gt;This will also probably be my last post to OKFN Labs - early next month,
I'll join &lt;a href=&quot;http://mozillaopennews.org&quot;&gt;Knight-Mozilla OpenNews&lt;/a&gt; at
&lt;a href=&quot;http://spiegel.de&quot;&gt;Spiegel Online&lt;/a&gt; to spend ten months working on tools
like this, assisting journalists in telling more compelling stories on
the web. I hope that by continuing to cooperate with my friends in the
&lt;a href=&quot;http://spendingstories.org&quot;&gt;Spending Stories&lt;/a&gt; project on Journoid and
similar efforts we can bring open (and some non-open) data into the
media, making a difference.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Photo credit: Mike Tigas, &lt;a href=&quot;http://www.flickr.com/photos/madmannova/8384618902/sizes/l/in/set-72157632527677275/&quot;&gt;If this then news demo&lt;/a&gt; (similar project we've
started at OpenNews)&lt;/em&gt;&lt;/p&gt;
</content>
   <author>
     <name>Friedrich Lindenberg</name>
   </author>
 </entry>
 
 <entry>
   <title>Web Scraping with CSS Selectors in Node using JSDOM or Cheerio</title>
   <link href="http://okfnlabs.org/blog/2013/01/15/web-scraping-with-node-css-selectors.html"/>
   <updated>2013-01-15T00:00:00-08:00</updated>
   <id>http://okfnlabs.org/blog/2013/01/15/web-scraping-with-node-css-selectors</id>
   <content type="html">&lt;p&gt;I've traditionally used python for web scraping but I'd been increasingly thinking about using Node given that it is pure JS and therefore could be a more natural fit when getting info out of &lt;em&gt;web&lt;/em&gt; pages.&lt;/p&gt;

&lt;p&gt;In particular, when my first steps when looking to extract information from a website is to open up the Chrome Developer tools (or Firebug in Firefox) and try and extract information by inspecting the page and playing around in the console - the latter is especially attractive if jQuery is available.&lt;/p&gt;

&lt;p&gt;What I often end up with from this is a few lines of jQuery selectors. My desire here was to find a way to directly reuse these same css selectors I use in my browser experimentation directly in the scraping script. Now, things like &lt;a href=&quot;http://packages.python.org/pyquery/&quot;&gt;pyquery&lt;/a&gt; do exist in python (and there is some css selector support in the brilliant BeautifulSoup) but a connection with something like Node seems even more natural - it is after the JS engine from a browser!&lt;/p&gt;

&lt;h2&gt;UK Crime Data&lt;/h2&gt;

&lt;p&gt;My immediate motivation for this work was wanting to play around with the &lt;a href=&quot;http://police.uk/data&quot;&gt;UK Crime data&lt;/a&gt; (all &lt;a href=&quot;http://opendefinition.org/&quot;&gt;open data&lt;/a&gt; now!).&lt;/p&gt;

&lt;p&gt;To do this I needed to:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Get the data in consolidated form by scraping the file list and data files from &lt;a href=&quot;http://police.uk/data/&quot;&gt;http://police.uk/data/&lt;/a&gt; - while they commendably provide the data in bulk there is no single file to download, instead there is one file per force per month.&lt;/li&gt;
&lt;li&gt;Do data cleaning and analysis - this included some fun geo-conversion and csv parsing&lt;/li&gt;
&lt;/ol&gt;


&lt;p&gt;I'm just going to talk about the first part in what folllows - though I hope to cover the second part in a follow up post.&lt;/p&gt;

&lt;p&gt;I should also note that all the code used for scraping and working with this data can be found in the &lt;a href=&quot;https://github.com/datasets/crime-uk&quot;&gt;UK Crime dataset data package on GitHub&lt;/a&gt; on Github - &lt;a href=&quot;https://github.com/datasets/crime-uk/blob/master/scripts/scrape.js&quot;&gt;scrape.js file is here&lt;/a&gt;. You can also see some of the ongoing results of these data experiments in an experimental &lt;a href=&quot;http://okfnlabs.org/crime/&quot;&gt;UK crime &quot;dashboard&quot; here&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;Scraping using CSS Selectors in Node&lt;/h2&gt;

&lt;p&gt;Two options present themselves when doing simple scraping using css selectors in node.js:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Using &lt;a href=&quot;https://github.com/tmpvar/jsdom&quot;&gt;jsdom&lt;/a&gt; (+ jquery)&lt;/li&gt;
&lt;li&gt;Using &lt;a href=&quot;https://github.com/MatthewMueller/cheerio&quot;&gt;cheerio&lt;/a&gt; (which provides jquery like access to html) + something to retrieve html (my preference is &lt;a href=&quot;https://github.com/mikeal/request&quot;&gt;request&lt;/a&gt; but you can just uses &lt;a href=&quot;http://nodejs.org/docs/v0.6.11/api/http.html#http.request&quot;&gt;node's built in http request&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;For the UK crime work I used jsdom but I've subsequently used cheerio as it is substantially faster so I'll cover both here (I didn't discover cheerio until I'd started on the crime work!).&lt;/p&gt;

&lt;p&gt;Here's an excerpted code example (full example in the &lt;a href=&quot;https://github.com/datasets/crime-uk/blob/master/scripts/scrape.js&quot;&gt;source file&lt;/a&gt;):&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;javascript&quot;&gt;&lt;span class=&quot;kd&quot;&gt;var&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;url&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;http://police.uk/data&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// holder for results&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;var&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;out&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;s1&quot;&gt;&amp;#39;streets&amp;#39;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[]&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;nx&quot;&gt;jsdom&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;env&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;({&lt;/span&gt;
  &lt;span class=&quot;nx&quot;&gt;html&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;url&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;nx&quot;&gt;scripts&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;
    &lt;span class=&quot;s1&quot;&gt;&amp;#39;http://code.jquery.com/jquery.js&amp;#39;&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;],&lt;/span&gt;
  &lt;span class=&quot;nx&quot;&gt;done&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;function&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;errors&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;window&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;kd&quot;&gt;var&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;window&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;c1&quot;&gt;// find all the html links to the street zip files&lt;/span&gt;
    &lt;span class=&quot;nx&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;#downloads .months table tr td:nth-child(2) a&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;each&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kd&quot;&gt;function&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;idx&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;elem&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
      &lt;span class=&quot;c1&quot;&gt;// push the url (href attribute) onto the list&lt;/span&gt;
      &lt;span class=&quot;nx&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;streets&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;].&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;push&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;elem&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;attr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;href&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;});&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;});&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;As an example of Cheerio scraping here's an example from work &lt;a href=&quot;https://github.com/datasets/opented&quot;&gt;scraping info the EU's TED database&lt;/a&gt; (sample &lt;a href=&quot;http://files.opented.org.s3.amazonaws.com/scraped/100120-2011/summary.html&quot;&gt;html file&lt;/a&gt;):&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;javascript&quot;&gt;&lt;span class=&quot;kd&quot;&gt;var&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;url&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;http://files.opented.org.s3.amazonaws.com/scraped/100120-2011/summary.html&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// place to store results&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;var&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;data&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{};&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// do the request using the request library&lt;/span&gt;
&lt;span class=&quot;nx&quot;&gt;request&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;url&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;function&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;err&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;resp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;body&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;){&lt;/span&gt;
  &lt;span class=&quot;nx&quot;&gt;$&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;cheerio&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;load&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;body&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

  &lt;span class=&quot;nx&quot;&gt;data&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;winnerDetails&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;.txtmark .addr&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;html&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;();&lt;/span&gt;

  &lt;span class=&quot;nx&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;.mlioccur .txtmark&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;each&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kd&quot;&gt;function&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;html&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;kd&quot;&gt;var&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;spans&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;html&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;find&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;span&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;kd&quot;&gt;var&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;span0&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;spans&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]);&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;span0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;text&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;Initial estimated total value of the contract &amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
      &lt;span class=&quot;kd&quot;&gt;var&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;amount&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;spans&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;4&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]).&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;text&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
      &lt;span class=&quot;nx&quot;&gt;data&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;finalamount&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;cleanAmount&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;amount&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
      &lt;span class=&quot;nx&quot;&gt;data&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;initialamount&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;cleanAmount&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;spans&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]).&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;text&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;());&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;});&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



</content>
   <author>
     <name>Rufus Pollock</name>
   </author>
 </entry>
 
 <entry>
   <title>Archiving Twitter the Hacky Way</title>
   <link href="http://okfnlabs.org/blog/2013/01/08/archiving-twitter-feeds-the-hacky-way.html"/>
   <updated>2013-01-08T00:00:00-08:00</updated>
   <id>http://okfnlabs.org/blog/2013/01/08/archiving-twitter-feeds-the-hacky-way</id>
   <content type="html">&lt;p&gt;There are many circumstances where you want to archive a tweets - maybe just from your own account or perhaps for a hashtag for an event or topic.&lt;/p&gt;

&lt;p&gt;Unfortunately Twitter search queries do not give data more than 7 days old and for a given account you can only get approximately the last 3200 of your tweets and 800 items from your timeline. [Update: People have pointed out that &lt;a href=&quot;http://blog.twitter.com/2012/12/your-twitter-archive.html&quot;&gt;Twitter released a feature to download an archive of your personal tweets at the end of December&lt;/a&gt; - this, of course, still doesn't help with queries or hashtags]&lt;/p&gt;

&lt;p&gt;Thus, if you want to archive twitter you'll need to come up with another solution (or pay them, or a reseller, a bunch of money - see Appendix below!). Sadly, most of the online solutions have tended to disappear or be acquired over time (e.g. twapperkeeper). So a DIY solution would be attractive. After reading various proposals on the web I've found the following to work pretty well (but see also this &lt;a href=&quot;http://mashe.hawksey.info/2012/01/twitter-archive-tagsv3/&quot;&gt;excellent google spreadsheet based solution&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;The proposed process involves 3 steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Locate the Twitter Atom Feed for your Search&lt;/li&gt;
&lt;li&gt;Use Google Reader as your Archiver&lt;/li&gt;
&lt;li&gt;Get your data out of Google Reader (a 1000 items at a time!)&lt;/li&gt;
&lt;/ol&gt;


&lt;p&gt;One current drawback of this solution is that each stage has to be done by hand. It could be possible to automate more of this, and especially the important third step, if I could work out how to do more with the &lt;a href=&quot;http://undoc.in/&quot;&gt;Google Reader API&lt;/a&gt;. Contributions or suggestions here would be very welcome!&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Note that the above method will become obsolete as of March 5 2013 when &lt;a href=&quot;https://dev.twitter.com/docs/api/1.1/overview#New_Twitter_client_policies&quot;&gt;Twitter close down RSS and Atom feeds&lt;/a&gt; - continuing their long march to becoming a &lt;del&gt;fully&lt;/del&gt; more closed and controlled ecosystem.&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;As you struggle, like me, to get precious archival information out of Twitter it may be worth reflecting on just how much information you've given to Twitter that you are now unable to retrieve (at least without paying) ...&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;Twitter Atom Feed&lt;/h2&gt;

&lt;p&gt;Twitter still have Atom feeds for their search queries:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;http://search.twitter.com/search.atom?q=my_search&quot;&gt;http://search.twitter.com/search.atom?q=my_search&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Note that if you want to search for a hash tag like #OpenData or a user e.g. @someone you'll need to escape the symbols:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;http://search.twitter.com/search.atom?q=%23OpenData&quot;&gt;http://search.twitter.com/search.atom?q=%23OpenData&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Unfortunately twitter atom queries are limited to only a few items (around 20) so we'll need to continuously archive that feed to get full coverage.&lt;/p&gt;

&lt;h2&gt;Archiving in Google Reader&lt;/h2&gt;

&lt;p&gt;Just add the previous feed URL in your Google Reader account. It will then start archiving.&lt;/p&gt;

&lt;p&gt;Aside: because the twitter atom feed is limited to a small number of items and the check in google reader only happens every 3 hours (1h if someone else is archiving the same feed) you can miss a lot of tweets. One option could be to use Topsy's RSS feeds &lt;a href=&quot;http://otter.topsy.com/searchdate.rss?q=%23okfn&quot;&gt;http://otter.topsy.com/searchdate.rss?q=%23okfn&lt;/a&gt; (though not clear how to get more items from this feed either!)&lt;/p&gt;

&lt;h2&gt;Gettting Data out of Google Reader&lt;/h2&gt;

&lt;p&gt;Google Reader offers a decent (though still beta) API. Unoffical docs for it can be found here: &lt;a href=&quot;http://undoc.in/&quot;&gt;http://undoc.in/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The key URL we need is:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;http://www.google.com/reader/atom/feed/[feed_address]?n=1000&quot;&gt;http://www.google.com/reader/atom/feed/[feed_address]?n=1000&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Note that the feed is limited to a maximum of 1000 items and you can only access it for your account if you are logged in. This means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If you have more than a 1000 items you need to find the continuation token in each set of results and then at &amp;amp;c={continuation-token} to your query.&lt;/li&gt;
&lt;li&gt;Because you need to be logged in your browser you need to do this by hand :-( (it may be possible to automate via the API but I couldn't get anything work - any tips much appreciated!)&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;Here's a concrete example (note, as you need to be logged in this won't work for you):&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;http://www.google.com/reader/atom/feed/http://search.twitter.com/search.atom%3Fq%3D%2523OpenData?n=1000&quot;&gt;http://www.google.com/reader/atom/feed/http://search.twitter.com/search.atom%3Fq%3D%2523OpenData?n=1000&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And that's it! You should now have a local archive of all your tweets!&lt;/p&gt;

&lt;h2&gt;Appendix&lt;/h2&gt;

&lt;p&gt;Increasing Twitter is selling access to the full Twitter archive and there are a variety of 3rd services (such as Gnip, DataSift, Topsy &lt;a href=&quot;https://dev.twitter.com/programs/twitter-certified-products/products#Data&quot;&gt;and possibly more&lt;/a&gt;) who are offering full or partial access for a fee.&lt;/p&gt;
</content>
   <author>
     <name>Rufus Pollock</name>
   </author>
 </entry>
 
 <entry>
   <title>Bundes-Git – German Laws on GitHub</title>
   <link href="http://okfnlabs.org/blog/2012/12/13/bundesgit-german-laws-on-github.html"/>
   <updated>2012-12-13T00:00:00-08:00</updated>
   <id>http://okfnlabs.org/blog/2012/12/13/bundesgit-german-laws-on-github</id>
   <content type="html">&lt;p&gt;If you compare software code and legislation you can find many similarities: both are big bodies of text spread over multiple units (laws/files). The total amount of text inevitably grows bigger over time with many small changes to existing parts while most of the corpus stays the same.&lt;/p&gt;

&lt;p&gt;However, the tooling and editing process for these domains is very different: while developers are in the fortunate position that they can build and improve their own tools, legislators are stuck with proprietary tools like MS Word that are simply not built to collaboratively work on a big corpus of text.&lt;/p&gt;

&lt;p&gt;But if source code and laws have a similar information structure, why not apply the tools used in software development to the legislative process? That is what Bundes-Git (&quot;Federal Git&quot;) is currently trying out in Germany.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/bundestag/gesetze&quot;&gt;Bundes-Git&lt;/a&gt; is a Git version control repository of all German Federal Laws and Regulations as Markdown. The goal was to come up with the simplest solution to handle laws that could possibly work and integrate it well into the existing developer ecosystem.&lt;/p&gt;

&lt;p&gt;The idea has been well received with &lt;a href=&quot;http://www.wired.com/wiredenterprise/2012/08/bundestag/&quot;&gt;an article on Wired.com&lt;/a&gt; and articles on German IT news sites &lt;a href=&quot;http://www.heise.de/open/meldung/Entwicklungshistorie-von-Gesetzen-mit-Git-verfolgen-1662758.html&quot;&gt;Heise&lt;/a&gt; and &lt;a href=&quot;www.golem.de/news/bundesgit-ein-git-repository-fuer-deutsche-gesetze-1208-93709.html&quot;&gt;Golem&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The popularity can surely also be attributed to our marvelous Bundes-Git mascot, dubbed octo eagle, thought up by myself and designed by &lt;a href=&quot;https://kkaefer.com/&quot;&gt;Konstantin Käfer&lt;/a&gt; released under &lt;a href=&quot;https://creativecommons.org/publicdomain/zero/1.0/&quot;&gt;CC0&lt;/a&gt; (please go this way if you are &lt;a href=&quot;http://bundesgit.spreadshirt.de/&quot;&gt;interested in a t-shirt or hoodie&lt;/a&gt;).&lt;/p&gt;

&lt;h3&gt;Design decisions explained&lt;/h3&gt;

&lt;p&gt;All other law storage formats use XML. But to me XML is neither human readable nor human writable. Let me get into the details of some of the design decisions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Git&lt;/strong&gt; because it's the most popular distributed version control system right now.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;GitHub&lt;/strong&gt; because it's the most popular Git host right now and comes with some nice perks like Pull Request and GitHub Pages.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Markdown&lt;/strong&gt; because any more structure like XML or JSON would make it harder for humans to read or write the format and diffs would be difficult to read.&lt;/li&gt;
&lt;li&gt;Naming files &lt;code&gt;index.md&lt;/code&gt; because it works nicely with &lt;strong&gt;Jekyll and GitHub Pages&lt;/strong&gt; renders all laws into a currently very simple page.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;YAML Front Matter&lt;/strong&gt; is necessary for Jekyll but also serves as nice a meta data store on laws.&lt;/li&gt;
&lt;li&gt;Committing from branches with non-fast-forward merges because... uhmm. This is really up for discussion. I want to keep track of where changes originate and branches are created for each law publication but this heavily diverts from the clean commit history philosophy that e.g. the Linux kernel lives by.&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;There are some more software development concepts that can be applied to the legislation process. Here are some fun things I'd like to try:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A &lt;a href=&quot;http://prose.io/&quot;&gt;prose.io&lt;/a&gt;-like editor to easily create law proposals and make a pull request.&lt;/li&gt;
&lt;li&gt;Measuring the complexity of corpus/laws/paragraphs and using Travis CI to test pull requests if they make the complexity worse. &lt;a href=&quot;http://www.clips.ua.ac.be/pages/pattern&quot;&gt;Pattern&lt;/a&gt; is a Python NLP library and they recently released a &lt;a href=&quot;http://www.clips.ua.ac.be/pages/pattern-de&quot;&gt;German module&lt;/a&gt; which I want to try on our laws.&lt;/li&gt;
&lt;li&gt;Testing foreign key integrity: are all referenced paragraphs still available?&lt;/li&gt;
&lt;li&gt;Create an informative visualization out of the Git log automatically like &lt;a href=&quot;http://blog.openingparliament.org/post/37650393621/what-opening-parliamentary-information-can-tell-us&quot;&gt;Gregor Aisch did by hand for the German political party law&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Let the German president sign off on commits to master.&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;The design decisions around Bundes-Git fit nicely into the Git/GitHub eco system but they are not set in stone. They also create some problems and annoyances that need to be fixed or circumvented. While I believe the general philosophy and the freshness of the approach is the right direction, we clearly need more discussion.&lt;/p&gt;

&lt;h3&gt;Future happenings around Bundes-Git:&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;We applied for funding at &lt;a href=&quot;http://innovation.globalintegrity.org/idea-submissions/2012/12/10/applying-version-control-to-the-legislative-process&quot;&gt;Testing 123 Global Integrity Innovation Fund&lt;/a&gt;. Bundes-Git definitely fits their criteria of brand new, innovative and high-risk. The decision will be made later this month, fingers crossed!&lt;/li&gt;
&lt;li&gt;I will talk at the &lt;a href=&quot;http://events.ccc.de/congress/2012/Fahrplan/events/5263.en.html&quot;&gt;29th Chaos Communication Congress about Bundes-Git&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;There will be Bundes-Git Hacker Meetup in mid January. If you are interested, &lt;a href=&quot;https://terminplaner.dfn.de/foodle.php?id=hhndrdx742az60wf&quot;&gt;sign up here&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;We decided that the language of discussion on GitHub will be German, but feel free to start a conversation on the &lt;a href=&quot;http://lists.okfn.org/mailman/listinfo/open-legislation&quot;&gt;OKF Open Legislation mailing list&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Also be sure to follow &lt;a href=&quot;https://twitter.com/bundesgit&quot;&gt;@bundesgit on Twitter&lt;/a&gt;!&lt;/strong&gt;&lt;/p&gt;
</content>
   <author>
     <name>Stefan Wehrmeyer</name>
   </author>
 </entry>
 
 <entry>
   <title>Speeding Up Your PyBossa App</title>
   <link href="http://okfnlabs.org/blog/2012/12/12/speeding-up-pybossa-apps.html"/>
   <updated>2012-12-12T00:00:00-08:00</updated>
   <id>http://okfnlabs.org/blog/2012/12/12/speeding-up-pybossa-apps</id>
   <content type="html">&lt;p&gt;Thanks to the free &lt;a href=&quot;http://crowdcrafting.org&quot;&gt;crowd-crafting&lt;/a&gt; tool &lt;a href=&quot;http://dev.pybossa.com/&quot;&gt;PyBossa&lt;/a&gt;, nowadays the biggest challenge for successful crowd-sourcing is engaging users for participating in tasks, and to keep that motivation at a high level over time. Therefor the user experience of crowd-sourcing apps plays a crucial role.&lt;/p&gt;

&lt;p&gt;After participating in quite a few tasks myself, I found that the loading time in between two tasks was the most annoying thing. Doing crowd-sourcing tasks often feels like doing something stupid, and you really want to get things done as fast as possible. Sometimes it needs just a single click to solve a task, but then it takes seconds to load the next one.&lt;/p&gt;

&lt;p&gt;This is because all existing apps where designed in a synchronous fashion. The client requests a new task and presents it to the user as soon as it has been loaded. &lt;em&gt;After&lt;/em&gt; the user has solved the task, the result is submitted and &lt;em&gt;after&lt;/em&gt; the result has been stored a new task is requested and so on.&lt;/p&gt;

&lt;p&gt;&lt;a rel=&quot;lightbox&quot; title=&quot;Process flow in current PyBossa apps&quot; href=&quot;/img/pybossa-workflow-old.png&quot;&gt;&lt;img src=&quot;/img/pybossa-workflow-old.png&quot; alt=&quot;current workflow&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt; (click to enlarge)&lt;/p&gt;

&lt;p&gt;Some apps even need to load additional information, such as images or data coming from external APIs. This loading time accumulates quickly, and will most probably lower the motivation of your users!&lt;/p&gt;

&lt;h2&gt;Pre-loading subsequent tasks == magic&lt;/h2&gt;

&lt;p&gt;The idea for reducing the loading time is actually pretty simple: We let the app load the next task &lt;em&gt;while&lt;/em&gt; the user is solving the current one. This results in a parallel process as described in the following chart:&lt;/p&gt;

&lt;p&gt;&lt;a rel=&quot;lightbox&quot; title=&quot;Process flow in current PyBossa apps&quot; href=&quot;/img/pybossa-workflow-new.png&quot;&gt;&lt;img src=&quot;/img/pybossa-workflow-new.png&quot; alt=&quot;proposed workflow&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To implement this in PyBossa, we needed to change the PyBossa API a little bit (thanks @&lt;a href=&quot;https://github.com/PyBossa/pybossa/commit/4f5bdd4698a1ac21f3021347cd9ec08e68f18bdc&quot;&gt;teleyinex&lt;/a&gt;). Before that change consecutive calls to the &lt;a href=&quot;http://pybossa.readthedocs.org/en/latest/model.html#requesting-a-new-task-for-current-user&quot;&gt;newtask endpoint&lt;/a&gt; would return the same task again and again, until the user has solved it. Now with the newly introduced parameter &lt;strong&gt;offset&lt;/strong&gt; you can request the next tasks in line.&lt;/p&gt;

&lt;p&gt;Another requirement for pre-loading of tasks is to keep the entire app on one page as otherwise the cached task would be lost. The rest of this post describes a smart way to implement this using &lt;a href=&quot;http://api.jquery.com/category/deferred-object/&quot;&gt;jQuery.Deferred&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;Smart implementation using jQuery.Deferred&lt;/h2&gt;

&lt;p&gt;Looking from our PyBossa app, the pre-loading of the next task and the user solving the current one are two asynchronous actions running in parallel. We have to wait until both are completed before we can proceed to the next task.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;http://eng.wealthfront.com/2012/12/jquerydeferred-is-most-important-client.html&quot;&gt;This article&lt;/a&gt; reminded me of a smart way to implement this using jQuery.Deferred. The following function shows everything we need for our main loop.&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;javascript&quot;&gt;&lt;span class=&quot;kd&quot;&gt;function&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;run&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;task&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;kd&quot;&gt;var&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;nextLoaded&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;loadTask&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
        &lt;span class=&quot;nx&quot;&gt;taskSolved&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;presentTask&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;task&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;nx&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;when&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;nextLoaded&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;taskSolved&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;done&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;run&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;To start the loop, we need to load the first task and pass it to run.&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;javascript&quot;&gt;&lt;span class=&quot;nx&quot;&gt;loadTask&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;().&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;done&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;run&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;Now let's take a look at &lt;code&gt;loadTask()&lt;/code&gt;. The parameter offset is passed to the API. After the task and everything else we might need is loaded we mark the deferred as resolved and pass the task over the done handler. Finally we return a 'locked' version of the deferred object.&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;javascript&quot;&gt;&lt;span class=&quot;kd&quot;&gt;function&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;loadTask&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;offset&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;nx&quot;&gt;offset&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;offset&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;||&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;kd&quot;&gt;var&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;taskLoaded&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;Deferred&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;();&lt;/span&gt;
    &lt;span class=&quot;nx&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;getJSON&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;/api/app/&amp;#39;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;appid&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;/newtask?offset=&amp;#39;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;offset&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;function&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;task&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;c1&quot;&gt;// load more data if you need&lt;/span&gt;
        &lt;span class=&quot;c1&quot;&gt;// and then, resolve Deferred&lt;/span&gt;
        &lt;span class=&quot;nx&quot;&gt;taskLoaded&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;resolve&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;task&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;});&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;taskLoaded&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;promise&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;();&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;We can use exactly the same method to model the user action. Therefor &lt;code&gt;presentTask()&lt;/code&gt; will returned a deferred object, too. It gets resolved as soon as the user has solved the task and the answer is correctly submitted to PyBossa.&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;javascript&quot;&gt;&lt;span class=&quot;kd&quot;&gt;function&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;presentTask&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;task&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;kd&quot;&gt;var&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;taskSolved&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;Deferred&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;();&lt;/span&gt;
    &lt;span class=&quot;c1&quot;&gt;// update presenter html&lt;/span&gt;
    &lt;span class=&quot;nx&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;.question&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;html&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;task&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;question&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;c1&quot;&gt;// wait for user action&lt;/span&gt;
    &lt;span class=&quot;nx&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;button.submit&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;off&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;click&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;on&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;click&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;function&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;kd&quot;&gt;var&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;answer&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;foo&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;Bar&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;};&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;// fetch answer from UI&lt;/span&gt;
        &lt;span class=&quot;nx&quot;&gt;pybossa&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;saveTask&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;task&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;answer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;done&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kd&quot;&gt;function&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
            &lt;span class=&quot;nx&quot;&gt;taskSolved&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;resolve&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;();&lt;/span&gt;            
        &lt;span class=&quot;p&quot;&gt;});&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;});&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;taskSolved&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;promise&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;();&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;And that's it.&lt;/p&gt;

&lt;p&gt;This method will significantly speed up your PyBossa app, especially if you need to fetch data from third party APIs. Remind yourself that even a speedup of a few seconds is a huge benefit for your voluntary users, as they are likely to go through this process quite often. And you really don't want to waste their time, do you?&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Update:&lt;/em&gt; Why not try the &lt;a href=&quot;http://crowdcrafting.org/app/flickrperson2/newtask&quot;&gt;FlickrPerson demo app the speedy way&lt;/a&gt;?&lt;/p&gt;
</content>
   <author>
     <name>Gregor Aisch</name>
   </author>
 </entry>
 
 <entry>
   <title>Javascript Timeline Libaries - A Review</title>
   <link href="http://okfnlabs.org/blog/2012/12/04/javascript-timeline-libaries-a-review.html"/>
   <updated>2012-12-04T00:00:00-08:00</updated>
   <id>http://okfnlabs.org/blog/2012/12/04/javascript-timeline-libaries-a-review</id>
   <content type="html">&lt;p&gt;This post is a rough and ready overview of various javascript timeline libraries that arose from research in creating a timeline view for &lt;a href=&quot;http://reclinejs.com/&quot;&gt;Recline JS&lt;/a&gt;. Note this material hung around on my hard disk for a few months so some of it may already be a little bit out of date!&lt;/p&gt;

&lt;p&gt;I want to start with a general comment. Timeline libraries consist of various components:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Data loading

&lt;ul&gt;
&lt;li&gt;Date parsing&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Band (timeline) rendering&lt;/li&gt;
&lt;li&gt;Showing render info on individual items&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;For me a timeline visualization library need only be the second of these but most that I've come across do more.&lt;/p&gt;

&lt;p&gt;In fact a major issue in my opinion with most libraries is that they are &lt;em&gt;under-componentized&lt;/em&gt; - they don't separate cleanly into these different components and end up doing everything.&lt;/p&gt;

&lt;p&gt;To take one example, the Verite timeline (in my view is one of the best libraries out there) has a whole bunch of its own custom date parsing built in inside an internal utility library which are hard to override or replace and also has a large chunk of code just for loading from google docs and other data sources. (You can of course somewhat solve this somewhat -- as I do in Recline by parsing the dates directly  and then submitting in a standardized form).&lt;/p&gt;

&lt;p&gt;In my view, even if library authors do want to include these sorts of things, it would be good to do it in a way that allowed for a clean separation so that you could just use the parts you wanted (and/or over-ride parts more cleanly).&lt;/p&gt;

&lt;h2&gt;Propublica Timeline Setter&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;http://propublica.github.com/timeline-setter/&quot;&gt;http://propublica.github.com/timeline-setter/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;HTML + JS

&lt;ul&gt;
&lt;li&gt;But Requires a build step (using ruby)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Very simple and compact design (nice!)&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;Verite Timeline&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;http://timeline.verite.co/&quot;&gt;http://timeline.verite.co/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Very elegant frontend design&lt;/li&gt;
&lt;li&gt;2 bands in timeline segment and tight integration of item display&lt;/li&gt;
&lt;li&gt;Includes much more than Timeline (e.g. sourcing data from google docs etc)&lt;/li&gt;
&lt;li&gt;Mozilla Public License (was GPL)&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;Simile Timeline&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;http://www.simile-widgets.org/timeline/&lt;/li&gt;
&lt;li&gt;The original open-source JS timeline but less regularly update and maintained today: &quot;As of Spring 2012, Exhibit is the only Simile widget seeing active development.&quot; and the timeline control has not been updated since 2009 (see this &lt;a href=&quot;http://stackoverflow.com/questions/4700419/alternative-to-simile-timeline-for-timeline-visualization&quot;&gt;stackoverflow question for more&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;Chronoline&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;http://stoicloofah.github.com/chronoline.js/&quot;&gt;http://stoicloofah.github.com/chronoline.js/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Recently developed and updated&lt;/li&gt;
&lt;li&gt;MIT licensed&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;Timeglider&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://github.com/timeglider/jquery_widget&quot;&gt;https://github.com/timeglider/jquery_widget&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Non-open license (but was MIT licensed &lt;a href=&quot;https://github.com/timeglider/jquery_widget/tree/345442fa3dc7c66b23c36031a6569693ecf309bd&quot;&gt;earlier on&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;CHAPS Timeline&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;http://almende.github.com/chap-links-library/timeline.html&quot;&gt;http://almende.github.com/chap-links-library/timeline.html&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Looks pretty nice though CSS is not quite as elegant (probably fixable!)&lt;/li&gt;
&lt;li&gt;Not clear whether it supports multiple bands&lt;/li&gt;
&lt;/ul&gt;

</content>
   <author>
     <name>Rufus Pollock</name>
   </author>
 </entry>
 
 <entry>
   <title>Following Money and Influence in the EU - the Open Interests Hackathon</title>
   <link href="http://okfnlabs.org/blog/2012/11/29/openinterests-review.html"/>
   <updated>2012-11-29T00:00:00-08:00</updated>
   <id>http://okfnlabs.org/blog/2012/11/29/openinterests-review</id>
   <content type="html">&lt;p&gt;
Making sense of massive datasets that document
the processes of lobbying and public procurement at European Union level
is not an easy task. Yet a group of 25 journalists, developers, graphic
designers and activists worked together at the &lt;a
href=&quot;http://okfnlabs.org/events/hackdays/lobbying.html&quot;&gt;Open Interests
Europe&lt;/a&gt; hackathon last weekend to create tools and maps that make it
easier for citizens and journalists to see how lobbyists try to
influence European policies and to understand how governments award
contracts for public services. The hackathon was organised by the
European Journalism Centre and the Open Knowledge Foundation with
support from Knight-Mozilla OpenNews.&lt;/p&gt;


&lt;p&gt;
At the Google Campus Cafe in Londonndon, one group dived into European
lobbying data made available via an API: &lt;a
href=&quot;http://api.lobbyfacts.eu/&quot;&gt;api.lobbyfacts.eu&lt;/a&gt;. Created by a
group of five NGOs: Corporate Europe Observatory, Friends of the Earth
Europe, Lobby Control, Tactical Tech and the Open Knowledge Foundation,
the API gives access to up-to-date, structured information about persons
and organisations registered as lobbyists in the &lt;a
href=&quot;http://europa.eu/transparency-register/&quot;&gt;EU Transparency
Register&lt;/a&gt;. The API is part&amp;nbsp;of lobbyfacts.eu, a website that aims
to make it easy for anyone to track lobbyists and their influence at
European Union level, due to launch in January 2013.&lt;/p&gt;


&lt;p&gt;
One of the projects Createdd with the lobby register data is a map
showing the locations of the offices of lobby firms based on their
turnover. The size of the bubbles on the map corresponds to the turnover
of the firm. Built by &lt;a href=&quot;https://twitter.com/pudo&quot;&gt;Friedrich
Lindenberg&lt;/a&gt;, the map is an overlay of a Stamen Design map with
Leafletjs.&lt;/p&gt;


&lt;p style=&quot;text-align: center&quot;&gt;
&lt;img alt=&quot;&quot;
src=&quot;https://lh4.googleusercontentercontent.com/Gz7dg2T1mfSb2U7uDfotj2_giiIj8-gSIa5GEpw0SoB7negarpQpeHEW13-QmxOF5YkC_vHg7fyQNeFGU65iyfYdx_cmzxf8nfLYVigKXBamuD8Roe0C&quot;
style=&quot;height: 320px;width: 600px&quot; /&gt;&lt;/p&gt;


&lt;p style=&quot;text-align: center&quot;&gt;
&lt;em&gt;Screenshot of &lt;a
hrefef=&quot;http://api.lobbyfacts.eu/map&quot;&gt;api.lobbyfacts.eu/map&lt;/a&gt;&amp;nbsp;showing
locations of lobbying firms across Europe&lt;/em&gt;&lt;/p&gt;


&lt;p&gt;
Other teams focused on data analysis, comparing the data from the EU
Transparency Register with that of the &lt;a
href=&quot;http://www.google.com/url?q=http%3A%2F%2Fec.europa.eu%2Ftransparency%2Fregexpert%2F&amp;amp;sa=D&amp;amp;sntz=1&amp;amp;usg=AFQjCNE2JbDkGcyojnufFa8-lw8sMFEpyA&quot;&gt;Register
of Expert Groups&lt;/a&gt;. Interesting leads for possible further
investigative work resulted from the comparison of the figures reported
by lobby firms in the Transparency Register with those collected by the
&lt;a
href=&quot;http://www.google.com/url?q=http%3A%2F%2Fwww.nbb.be%2Fpub%2Fhome.htm&amp;amp;sa=D&amp;amp;sntz=1&amp;amp;usg=AFQjCNEOiiu39BbbE6C8eJF7FI_8J1vT9Q&quot;&gt;National
Bank of Belgium&lt;/a&gt;. &amp;ldquo;Some companies underreported massively to
the National Bank of Belgium and some of them were making themselves
look bigger in the Transparency Register,&amp;rdquo; said Eric Wesselius,
leader of the lobby transparency challenge and co-founder of &lt;a
href=&quot;http://corporateeurope.org/&quot;&gt;Corporate Europe Observatory&lt;/a&gt;.
Wesselius&amp;rsquo; organisation will continue investigations in this
area.&lt;/p&gt;


&lt;p&gt;
A second group of journalists and graphic designers led by Jack
Thurston, an activist involved in &lt;a
href=&quot;http://fishsubsidy.org/&quot;&gt;Fishsubsidy.org&lt;/a&gt;, discussed how fish
subsidy data could be used for finding journalistic stories and explored
various ways in which the unintended consequences of the EU fish
subsidies programme, such as overfishing, could be compellingly
presented to the general public. &amp;nbsp;&lt;/p&gt;


&lt;p style=&quot;text-align: center&quot;&gt;
&lt;img alt=&quot;&quot;
src=&quot;https://lh3.googleusercontent.compellinglym/aRSFEijY87FeGF1vDcWwVJBYQlvNV1uordwuc7kVcjheSV6uDBvmRyKn9e4R5GgtFjTuA1-lh_1m2sAp-3S6qKb7QPW1ASFV3WIWWv_2ff9YX7gEWA0&quot;
style=&quot;width: 400px;height: 300px&quot; /&gt;&lt;/p&gt;


&lt;p style=&quot;text-align: center&quot;&gt;
&lt;em&gt;Sketch for interactive graphic showing fishing  vessels, their
trajectory and the subsidies they receive, made by graphic designer &lt;a
href=&quot;http://helenesears.carbonmade.com/&quot;&gt;Helene Sears&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;


&lt;p&gt;
A theyhird group looked into European public procurement data.
&amp;ldquo;Public procurement is an area that is underreported by
journalists,&amp;rdquo; said data journalist Anders Pedersen, founder of &lt;a
href=&quot;http://opented.org/&quot;&gt;OpenTED&lt;/a&gt;. &amp;ldquo;9-25% of the GDP in the
EU is procurement - highest in the Netherlands where it is around 35%.
It&amp;rsquo;s a real issue in times of austerity who provides our
services,&amp;rdquo; he added.&lt;/p&gt;


&lt;p&gt;
Several &lt;a
href=&quot;http://www.google.com/urll?q=https%3A%2F%2Fgithub.com%2Fmiha-stopar%2Fsandbox&amp;amp;sa=D&amp;amp;sntz=1&amp;amp;usg=AFQjCNEPCecCTO1CWVEDufnaAtGGR4Q4Tw&quot;&gt;scrapers&lt;/a&gt;
were built to access the data relating to winners of contracts and the
values of these contracts from the EU publication &lt;a
href=&quot;http://ted.europa.eu/TED/main/HomePage.do&quot;&gt;TED&lt;/a&gt;&amp;nbsp;(Tenders
Electronic Daily). A map of public procurement contracts by awarding
city was created using Google Fusion Tables by geocoding the original
CSV file, enriched with OpenStreetMap.&lt;/p&gt;


&lt;p style=&quot;text-align: center&quot;&gt;
&lt;img src=&quot;https://lh5.googleusercontentnt.com/oJnD9EYVOLshaLA4j3dsMHf4JxU3tzTHiQQcnjF8XFY20Psfm4Z4xlgWBOSePQzwE4SplYfyc_b_W19eCVtKMQgl00eDlDQDMxMjkkM2ghgmGYV6_AZc&quot;
style=&quot;height: 321px;width: 600px&quot; /&gt;&lt;/p&gt;


&lt;p style=&quot;text-align: center&quot;&gt;
&lt;enrichedm&gt;Screenshot of &lt;a
href=&quot;https://www.google.com/fusiontables/data?docid=1Cq8cKQ2r739is5gXegmX-fkI6ASAi5OOe9mepIo&amp;amp;pli=1#map:id=3&quot;&gt;map
of public procurement contracts&lt;/a&gt; by Benjamin Simatos and Martin
Stabe&lt;/em&gt;&lt;/p&gt;


&lt;p&gt;
Pedersen&amp;rsquo;s long term goal is to create an interface and an API for
EU public procurement data and to publish some more visualisations.
&amp;ldquo;A lot of the work that got done here [at the hackathon] we would
not have gotten done in the next months maybe. It really helped us push
far ahead in terms of ideas and in terms of getting stuff
done.&amp;rdquo;&lt;/p&gt;


&lt;p&gt;
This blog post is cross-posted from the &lt;a
href=&quot;http://datadrivenjournalism.net/news_and_analysis/Following_Money_and_Influence_in_the_EU_the_Open_Interests_Europe_Hackday&quot;&gt;Data-driven
Journalism Blog&lt;/a&gt;.
&lt;/p&gt;


&lt;p&gt;&lt;em&gt;Photo of participants at the hackahon by &lt;a
href=&quot;http://www.flickr.com/photos/fred2baro/&quot;&gt;Mehdi
Guiraud&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;&lt;/p&gt;
</content>
   <author>
     <name>Liliana Bounegru</name>
   </author>
 </entry>
 
 <entry>
   <title>Scraping Data Behind a CAPTCHA</title>
   <link href="http://okfnlabs.org/blog/2012/11/13/scrapping-data-behind-a-captcha.html"/>
   <updated>2012-11-13T00:00:00-08:00</updated>
   <id>http://okfnlabs.org/blog/2012/11/13/scrapping-data-behind-a-captcha</id>
   <content type="html">&lt;p&gt;How much does the highest paid person in the Brazilian Federal Senate earns?
That's the question I asked myself a few weeks ago, and one that should be
easy to answer. In Brazil, every public body must publish its employees'
salaries online, but some do so in a terrible way. The Federal Senate is
one of these.&lt;/p&gt;

&lt;p&gt;To access its data you have to not only fill in your personal info, but also
solve a CAPTCHA for each salary you want to see. With no other tricks, it would
take ages to answer my question. I needed a way to gather all salaries and
compare them. But how to scrape a page that's &quot;protected&quot; behind a CAPTCHA?&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/img/res/senado-gov-br-captcha.jpg&quot;
style=&quot;margin: 0 auto; display: block;&quot; alt=&quot;senado.gov.br CAPTCHA&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;http://decaptcher.com&quot;&gt;Decaptcher&lt;/a&gt; is a company that sells CAPTCHA-solving
services. They provide an API that you can send an image, and get the contained
text. It's really cheap (US$ 1.38 per 1.000 CAPTCHAs), and works well, albeit a
bit slow (30~40 secs).  They promise a success rate of over 95%, but I got only
43% in my tests. Probably because the CAPTCHAs I'm sending are really hard to read.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;http://decaptcher.org/api&quot;&gt;Their API&lt;/a&gt; is simple to implement, with only 3
actions (upload, refund, and balance). There're examples in C# and PHP, and
I've hacked together &lt;a href=&quot;https://gist.github.com/4063793&quot;&gt;one in Ruby&lt;/a&gt;. For a
bit more than US$ 5.92, I was able to access and publish the salaries of
4,487 public servants in &lt;a href=&quot;http://senado.cc&quot;&gt;http://senado.cc&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;There're many other companies that offer the same service, like
&lt;a href=&quot;http://deathbycaptcha.com&quot;&gt;Death by CAPTCHA&lt;/a&gt;, &lt;a href=&quot;http://bypasscaptcha.com/&quot;&gt;Bypass CAPTCHA&lt;/a&gt;,
&lt;a href=&quot;http://www.beatcaptchas.com/&quot;&gt;Beat CAPTCHA&lt;/a&gt;, and &lt;a href=&quot;http://antigate.com/&quot;&gt;Antigate&lt;/a&gt;.
These services allow us to access public data that would be unreachable otherwise,
but they might be considered illegal in some countries. As we're not breaking the
CAPTCHA, but paying people to solve them, we should be fine. But don't take my word
for it: ask a lawyer.&lt;/p&gt;
</content>
   <author>
     <name>Vitor Baptista</name>
   </author>
 </entry>
 
 <entry>
   <title>Recline JS Search Demo</title>
   <link href="http://okfnlabs.org/blog/2012/11/01/recline-js-search-demo.html"/>
   <updated>2012-11-01T00:00:00-07:00</updated>
   <id>http://okfnlabs.org/blog/2012/11/01/recline-js-search-demo</id>
   <content type="html">&lt;p&gt;&lt;a href=&quot;http://reclinejs.com/&quot;&gt;&lt;img src=&quot;http://assets.okfn.org/p/recline/img/logo.png&quot; style=&quot;float: right; height: 100px;&quot; alt=&quot;Recline JS&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We've recently finished a demo for ReclineJS showing how it can be used to build
JS-based (ajax-style) search interfaces in minutes (or even seconds!):
&lt;a href=&quot;http://reclinejs.com/demos/search/&quot;&gt;http://reclinejs.com/demos/search/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Because of Recline's &lt;a href=&quot;http://reclinejs.com/docs/backends.html&quot;&gt;pluggable backends&lt;/a&gt; you get out of the box
support for data sources such as SOLR, Google Spreadsheet, ElasticSearch, or
plain old JSON or CSV &amp;ndash; see examples below for live examples of using
different backends.&lt;/p&gt;

&lt;p&gt;Interested in using this yourself? The &lt;a href=&quot;http://reclinejs.com//docs/src/demo.search.app.html&quot;&gt;(prettified) source JS for the demo is
available&lt;/a&gt; (plus the &lt;a href=&quot;http://reclinejs.com/demos/search/demo.search.app.js&quot;&gt;raw version&lt;/a&gt;) and it shows how simple
it is to build an app like this using Recline &amp;ndash; plus it has tips on how
to customize and extend).&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;http://reclinejs.com/demos/search/&quot;&gt;&lt;img src=&quot;http://i.imgur.com/Ja8SV.png&quot; alt=&quot;demo&quot; style=&quot;width: 100%&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;More Examples&lt;/h2&gt;

&lt;p&gt;In addition to the simple example with local data there are several other
examples showing how one can use this with other data sources including Google
Docs and SOLR:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;A &lt;a href=&quot;http://reclinejs.com/demos/search/?backend=gdocs&amp;amp;url=https://docs.google.com/spreadsheet/ccc?key=0Aon3JiuouxLUdExXSTl2Y01xZEszOTBFZjVzcGtzVVE&quot;&gt;search example using a google docs listing Shell Oil spills in the Niger
delta&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A &lt;a href=&quot;http://reclinejs.com/demos/search/?backend=solr&amp;amp;url=http://openspending.org/api/search&quot;&gt;search example running of OpenSpending SOLR
API&lt;/a&gt;
&amp;ndash; we suggest searching for something interesting like &quot;Drugs&quot; or &quot;Nuclear
power&quot;!&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;


&lt;h2&gt;Code&lt;/h2&gt;

&lt;p&gt;The full &lt;a href=&quot;http://reclinejs.com//docs/src/demo.search.app.html&quot;&gt;(prettified) source JS for the demo is available&lt;/a&gt;
(plus the &lt;a href=&quot;http://reclinejs.com/demos/search/demo.search.app.js&quot;&gt;raw version&lt;/a&gt;) but here's a key code sample to give a flavour:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;javascript&quot;&gt;&lt;span class=&quot;c1&quot;&gt;// ## Simple Search View&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;//&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// This is a simple bespoke Backbone view for the Search. It Pulls together&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// various Recline UI components and the central Dataset and Query (state)&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// object&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;//&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// It also provides simple support for customization e.g. of template for list of results&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// &lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;//      var view = new SearchView({&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;//        el: $(&amp;#39;some-element&amp;#39;),&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;//        model: dataset&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;//        // EITHER a mustache template (passed a JSON version of recline.Model.Record&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;//        // OR a function which receives a record in JSON form and returns html&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;//        template: mustache-template-or-function&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;//      });&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;var&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;SearchView&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;Backbone&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;View&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;extend&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;({&lt;/span&gt;
  &lt;span class=&quot;nx&quot;&gt;initialize&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;function&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;options&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;el&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;el&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;nx&quot;&gt;_&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;bindAll&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;render&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;recordTemplate&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;options&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;template&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;c1&quot;&gt;// Every time we do a search the recline.Dataset.records Backbone&lt;/span&gt;
    &lt;span class=&quot;c1&quot;&gt;// collection will get reset. We want to re-render each time!&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;model&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;records&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;bind&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;reset&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;render&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;templateResults&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;options&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;template&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;},&lt;/span&gt;

  &lt;span class=&quot;c1&quot;&gt;// overall template for this view&lt;/span&gt;
  &lt;span class=&quot;nx&quot;&gt;template&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39; \&lt;/span&gt;
&lt;span class=&quot;s1&quot;&gt;    &amp;lt;div class=&amp;quot;controls&amp;quot;&amp;gt; \&lt;/span&gt;
&lt;span class=&quot;s1&quot;&gt;      &amp;lt;div class=&amp;quot;query-here&amp;quot;&amp;gt;&amp;lt;/div&amp;gt; \&lt;/span&gt;
&lt;span class=&quot;s1&quot;&gt;    &amp;lt;/div&amp;gt; \&lt;/span&gt;
&lt;span class=&quot;s1&quot;&gt;    &amp;lt;div class=&amp;quot;total&amp;quot;&amp;gt;&amp;lt;h2&amp;gt;&amp;lt;span&amp;gt;&amp;lt;/span&amp;gt; records found&amp;lt;/h2&amp;gt;&amp;lt;/div&amp;gt; \&lt;/span&gt;
&lt;span class=&quot;s1&quot;&gt;    &amp;lt;div class=&amp;quot;body&amp;quot;&amp;gt; \&lt;/span&gt;
&lt;span class=&quot;s1&quot;&gt;      &amp;lt;div class=&amp;quot;sidebar&amp;quot;&amp;gt;&amp;lt;/div&amp;gt; \&lt;/span&gt;
&lt;span class=&quot;s1&quot;&gt;      &amp;lt;div class=&amp;quot;results&amp;quot;&amp;gt; \&lt;/span&gt;
&lt;span class=&quot;s1&quot;&gt;        } \&lt;/span&gt;
&lt;span class=&quot;s1&quot;&gt;      &amp;lt;/div&amp;gt; \&lt;/span&gt;
&lt;span class=&quot;s1&quot;&gt;    &amp;lt;/div&amp;gt; \&lt;/span&gt;
&lt;span class=&quot;s1&quot;&gt;    &amp;lt;div class=&amp;quot;pager-here&amp;quot;&amp;gt;&amp;lt;/div&amp;gt; \&lt;/span&gt;
&lt;span class=&quot;s1&quot;&gt;  &amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
 
  &lt;span class=&quot;c1&quot;&gt;// render the view&lt;/span&gt;
  &lt;span class=&quot;nx&quot;&gt;render&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;function&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;kd&quot;&gt;var&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;results&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;_&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;isFunction&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;templateResults&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
      &lt;span class=&quot;kd&quot;&gt;var&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;results&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;_&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;map&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;model&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;records&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;toJSON&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(),&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;templateResults&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;join&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;\n&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;else&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
      &lt;span class=&quot;c1&quot;&gt;// templateResults is just for one result ...&lt;/span&gt;
      &lt;span class=&quot;kd&quot;&gt;var&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;tmpl&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;&amp;#39;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;templateResults&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; 
      &lt;span class=&quot;kd&quot;&gt;var&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;results&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;Mustache&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;render&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;tmpl&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;nx&quot;&gt;records&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;model&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;records&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;toJSON&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
      &lt;span class=&quot;p&quot;&gt;});&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
    &lt;span class=&quot;kd&quot;&gt;var&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;html&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;Mustache&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;render&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;template&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
      &lt;span class=&quot;nx&quot;&gt;results&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;results&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;});&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;el&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;html&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;html&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

    &lt;span class=&quot;c1&quot;&gt;// Set the total records found info&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;el&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;find&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;.total span&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;text&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;model&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;recordCount&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

    &lt;span class=&quot;c1&quot;&gt;// ### Now setup all the extra mini-widgets&lt;/span&gt;
    &lt;span class=&quot;c1&quot;&gt;// &lt;/span&gt;
    &lt;span class=&quot;c1&quot;&gt;// Facets, Pager, QueryEditor etc&lt;/span&gt;

    &lt;span class=&quot;kd&quot;&gt;var&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;view&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;recline&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;View&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;FacetViewer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;({&lt;/span&gt;
      &lt;span class=&quot;nx&quot;&gt;model&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;model&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;});&lt;/span&gt;
    &lt;span class=&quot;nx&quot;&gt;view&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;render&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;();&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;el&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;find&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;.sidebar&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;append&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;view&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;el&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

    &lt;span class=&quot;kd&quot;&gt;var&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;pager&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;recline&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;View&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;Pager&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;({&lt;/span&gt;
      &lt;span class=&quot;nx&quot;&gt;model&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;model&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;queryState&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;});&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;el&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;find&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;.pager-here&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;append&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;pager&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;el&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

    &lt;span class=&quot;kd&quot;&gt;var&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;queryEditor&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;recline&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;View&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;QueryEditor&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;({&lt;/span&gt;
      &lt;span class=&quot;nx&quot;&gt;model&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;model&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;queryState&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;});&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;el&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;find&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;.query-here&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;append&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;queryEditor&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;el&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



</content>
   <author>
     <name>Rufus Pollock</name>
   </author>
 </entry>
 
 <entry>
   <title>Labs Show and Tell - 26th October!</title>
   <link href="http://okfnlabs.org/blog/2012/10/23/show-and-tell.html"/>
   <updated>2012-10-23T00:00:00-07:00</updated>
   <id>http://okfnlabs.org/blog/2012/10/23/show-and-tell</id>
   <content type="html">&lt;p&gt;&lt;img src=&quot;http://assets.okfn.org/p/labs/img/tent.png&quot; style=&quot;margin-left: 30px; float: right;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;We're having the next Show and Tell on Friday, &lt;a href=&quot;http://www.timeanddate.com/worldclock/fixedtime.html?iso=20121026T1430&amp;amp;p1=136&quot;&gt;26 October at 2:30 pm BST&lt;/a&gt; via Google Hangout on Air. As usual, the URL will be posted on &lt;a href=&quot;https://plus.google.com/108417336285743833546/posts&quot;&gt;OKFN Labs' G+ Page&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If you'd like to present, add your name to &lt;a href=&quot;http://okfnpad.org/show-and-tell-Oct-26&quot;&gt;the list&lt;/a&gt;. Remember, &lt;a href=&quot;http://webchat.freenode.net/?channels=okfn&quot;&gt;#okfn on irc.freenode.net&lt;/a&gt; will be the backchannel for discussion and questions, so don't forget to hang out there.&lt;/p&gt;

&lt;h3&gt;What's Show and Tell?&lt;/h3&gt;

&lt;p&gt;Have you built some cool tech you want to show everyone? Played around with some data? The Labs Show and Tell is your chance to share it with the OKFN Labs community! You get 2 to 5 minute to show us what you built!&lt;/p&gt;

&lt;h3&gt;Missed the last one?&lt;/h3&gt;

&lt;p&gt;On Oct 12, 2012, we had the first Labs Show and Tell. Here's what we talked about:&lt;/p&gt;

&lt;h4&gt;&lt;a href=&quot;http://promiscuity.tentacleriot.eu/&quot;&gt;Scientific Promiscuity&lt;/a&gt; - Michael Bauer&lt;/h4&gt;

&lt;p&gt;Scientific papers are rarely written by a single person. Usually many authors come together to work on a specific issue. This visualization uses data obtained from Pubmed to show collaboration between authors.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/img/dashboard.png&quot; style=&quot;margin-left: 30px; float: right;&quot; /&gt;&lt;/p&gt;

&lt;h4&gt;&lt;a href=&quot;http://activityapi.herokuapp.com/&quot;&gt;Activity API&lt;/a&gt; - &lt;a href=&quot;https://github.com/okfn/activityapi&quot;&gt;Code&lt;/a&gt; - Tom Rees&lt;/h4&gt;

&lt;p&gt;Activity API scrapes through multiple data sources and creates one single PostgreSQL database with all the data. It scrapes through GitHub, Twitter, mailing lists posts, and Twitter.&lt;/p&gt;

&lt;h4&gt;&lt;a href=&quot;http://okfnlabs.org/dashboard/#project/labs&quot;&gt;Dashboard&lt;/a&gt; - &lt;a href=&quot;https://github.com/okfn/dashboard&quot;&gt;Code&lt;/a&gt; - Tom Rees&lt;/h4&gt;

&lt;p&gt;The OKFN Community Dashboard provides an overview of community activity. We have a flourishing and diverse set of activities and it can be hard, even for people 'inside' to see what is going on. The Dashboard helps us see quickly what is going on.&lt;/p&gt;

&lt;h4&gt;&lt;a href=&quot;http://nomenklatura.okfnlabs.org/&quot;&gt;nomenklatura&lt;/a&gt; - &lt;a href=&quot;https://github.com/pudo/nomenklatura&quot;&gt;Code&lt;/a&gt; - Friedrich&lt;/h4&gt;

&lt;p&gt;A lot of time in data wrangling is spent making mappings of variant names to a canonical form. This app provides an easy-to-use, web-based method for creating such mappings, to allow for a more managed data cleansing pipeline.&lt;/p&gt;

&lt;h4&gt;&lt;a href=&quot;https://github.com/okfn/messytables&quot;&gt;Messy Tables&lt;/a&gt; - Friedrich&lt;/h4&gt;

&lt;p&gt;A library for dealing with messy tabular data in several formats, guessing types and detecting headers.&lt;/p&gt;

&lt;h4&gt;&lt;a href=&quot;https://github.com/stefanw/froide&quot;&gt;froide&lt;/a&gt; - stefanw&lt;/h4&gt;

&lt;p&gt;Froide is a Freedom Of Information tracker. The name comes from Freedom of Information (de). Also Froide sounds like Freude which is German for joy.&lt;/p&gt;

&lt;h4&gt;&lt;a href=&quot;https://travis-ci.org/#!/PyBossa/pybossa&quot;&gt;PyBossa on Travis CI&lt;/a&gt; - Nigel&lt;/h4&gt;

&lt;p&gt;PyBossa now uses Travis CI for continuous integration. Makes reviewing pull requests easier since we can see test status right away.&lt;/p&gt;
</content>
   <author>
     <name>Nigel Babu</name>
   </author>
 </entry>
 
 <entry>
   <title>Wrangling dirty data with messytables.</title>
   <link href="http://okfnlabs.org/blog/2012/10/22/messytables.html"/>
   <updated>2012-10-22T00:00:00-07:00</updated>
   <id>http://okfnlabs.org/blog/2012/10/22/messytables</id>
   <content type="html">&lt;p&gt;One of the largest data collection projects we have done so far
has been the &lt;a href=&quot;http://openspending.org/resources/gb-spending/&quot;&gt;consolidation of the UK's departmental expenditure&lt;/a&gt;.
Over 370 different government entities have published a total
of more than 7000 spreadsheets. Many of those have obviously
been hand-crafted or at least manually processed. Our goal was to
consolidate the contained information into a single
spreadsheet, discarding all the eccentricities included by the individual
publishers.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/okfn/messytables&quot;&gt;messytables&lt;/a&gt; is a simple
Python library that tries to extract tabular contents from
spreadsheet documents created by human editors. Often, even files
released as CSV or Excel are still not easy to parse
programmatically. Some people like to start off spreadsheets with
a title column or some metadata, while others use inapproriate
formats to represent numbers or dates.&lt;/p&gt;

&lt;p&gt;The tool offers a set of functions that help to make parsing data
easier:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;A &lt;strong&gt;headers detector&lt;/strong&gt; tries to determine which row in a spreadsheet
contains the actual header definitions (as opposed to any trailing
content).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;type detection&lt;/strong&gt; attempts to guess the data type for each column,
including a wide range of commonly used date formats.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;support for &lt;strong&gt;streaming data&lt;/strong&gt;, so that extremely large tables can
be processed without loading the entire data into memory.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;and, of course, it supports a &lt;strong&gt;range of spreadsheet types&lt;/strong&gt; - from
trusty CSV to Excel and even OpenOffice formats.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;We've since also started using messytables to load data into the
&lt;a href=&quot;http://ckan.org/2012/10/22/ckan-1-8-released/&quot;&gt;data API of CKAN&lt;/a&gt;,
where it serves as the ETL for the datastore and related
&lt;a href=&quot;http://reclinejs.com/&quot;&gt;ReclineJS&lt;/a&gt; previews.&lt;/p&gt;

&lt;p&gt;If you're interested, check out the &lt;a href=&quot;http://messytables.readthedocs.org/en/latest/index.html&quot;&gt;messytables documentation&lt;/a&gt;
and the &lt;a href=&quot;https://github.com/openspending/dpkg-uk25k/blob/master/extract.py&quot;&gt;uk25k scripts&lt;/a&gt;
which use it to gather UK government finance.&lt;/p&gt;

&lt;p&gt;Of course, messytables is not a cure-all and only useful for reading
data.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;http://docs.python-tablib.org/en/latest/&quot;&gt;tablib&lt;/a&gt;, for example, has
a fantastic API that makes writing, analyzing and converting data a
breeze.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;http://csvkit.readthedocs.org/en/latest/index.html&quot;&gt;csvkit&lt;/a&gt; has a
set of command line utilities that should be pre-installed on any
computer.&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;But when it comes to tables that are a complete mess: give it a try!&lt;/p&gt;
</content>
   <author>
     <name>Friedrich Lindenberg</name>
   </author>
 </entry>
 
 <entry>
   <title>Open Interests Hackathon in London, 24-25 November</title>
   <link href="http://okfnlabs.org/blog/2012/10/15/openinterests.html"/>
   <updated>2012-10-15T00:00:00-07:00</updated>
   <id>http://okfnlabs.org/blog/2012/10/15/openinterests</id>
   <content type="html">&lt;p&gt;
The &lt;a href=&quot;http://ejc.net&quot;&gt;European Journalism Centre&lt;/a&gt; and the Open Knowledge Foundation,
sponsored by &lt;a href=&quot;http://mozillaopennews.org/&quot;&gt;Knight-Mozilla
  OpenNews&lt;/a&gt;, invite you to the &lt;a href=&quot;/events/hackdays/lobbying.html&quot;&gt;Open Interests
  Hackathon&lt;/a&gt; to track the the interests and money flows which shape European policy.
&lt;/p&gt;
&lt;p&gt;
  &lt;strong&gt;When&lt;/strong&gt;: 24-25 November
&lt;/p&gt;
&lt;p&gt;
  &lt;strong&gt;Where&lt;/strong&gt;: Google Campus Cafe, 4-5 Bonhill Street, EC2A 4BX London
&lt;/p&gt;

&lt;p&gt;
How EU money is spent is an issue that concerns everyone who pays taxes to the EU. As the influence of Brussels lobbyists grows, it is increasingly important to draw the connections between lobbying, policy-making and funding. Journalists and activists need browsable databases, tools and platforms to investigate lobbyists’ influence and where the money goes in the EU. Join us and help build these tools!
&lt;/p&gt;
&lt;p&gt;
Open Interests Europe brings together developers, designers, activists, journalists and other geeks for two days of collaboration, learning, fun, intense hacking and app building.
&lt;/p&gt;

&lt;div class=&quot;teaser boxed&quot;&gt;
  &lt;a href=&quot;/events/hackdays/lobbying.html&quot;&gt;Visit the event page to learn
    more&lt;/a&gt;
&lt;/div&gt;
</content>
   <author>
     <name>Velichka Dimitrova</name>
   </author>
 </entry>
 
 <entry>
   <title>Labs Show and Tell - All Welcome!</title>
   <link href="http://okfnlabs.org/blog/2012/10/10/show-and-tell.html"/>
   <updated>2012-10-10T00:00:00-07:00</updated>
   <id>http://okfnlabs.org/blog/2012/10/10/show-and-tell</id>
   <content type="html">&lt;p&gt;&lt;img src=&quot;http://assets.okfn.org/p/labs/img/tent.png&quot; style=&quot;margin-left: 30px; float: right;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Built an app or tool you want to show people? Played around with some
interesting data? Know of a new development people should know about? Want to
find out what others are doing?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Come to the &lt;strong&gt;Show and Tell this Friday&lt;/strong&gt; and share what you are up to with the
community!&lt;/p&gt;

&lt;h3&gt;Sign up&lt;/h3&gt;

&lt;p&gt;Want to participate? Just add your name to &lt;a href=&quot;http://okfnpad.org/show-and-tell-Oct-12&quot;&gt;the list on the etherpad&lt;/a&gt;! If
you want to present just add a brief title and/or short description.&lt;/p&gt;

&lt;p&gt;Remember, &lt;a href=&quot;http://webchat.freenode.net/?channels=okfn&quot;&gt;#okfn on irc.freenode.net&lt;/a&gt; will be the backchannel for
discussion and questions, so feel free jump in there if you have questions or
queries or just want to shoot the breeze.&lt;/p&gt;

&lt;h3&gt;When?&lt;/h3&gt;

&lt;p&gt;Friday, &lt;a href=&quot;http://www.timeanddate.com/worldclock/fixedtime.html?iso=20121012T1430&amp;amp;p1=136&quot;&gt;12 October at 2:30 pm BST - that's 10:30am EST, 3:30pm CET etc&lt;/a&gt;.
The session will last &lt;strong&gt;30m with presentation slots of 2-5m&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;Where?&lt;/h3&gt;

&lt;p&gt;Google Hangout on Air and &lt;a href=&quot;http://webchat.freenode.net/?channels=okfn&quot;&gt;#okfn on irc.freenode.net&lt;/a&gt;. We'll post the on air
URL on &lt;a href=&quot;https://plus.google.com/108417336285743833546/posts&quot;&gt;OKFN Labs' G+ Page&lt;/a&gt;, here and on &lt;a href=&quot;http://twitter.com/okfnlabs&quot;&gt;OKFN Labs twitter&lt;/a&gt;.&lt;/p&gt;
</content>
   <author>
     <name>Nigel Babu</name>
   </author>
 </entry>
 
 <entry>
   <title>Data Catalogues are People!</title>
   <link href="http://okfnlabs.org/blog/2012/09/25/datacatalogues.html"/>
   <updated>2012-09-25T00:00:00-07:00</updated>
   <id>http://okfnlabs.org/blog/2012/09/25/datacatalogues</id>
   <content type="html">&lt;p&gt;Last week, &lt;a href=&quot;https://twitter.com/matejkurian&quot;&gt;Matej Kurian&lt;/a&gt; published
a message on the &lt;a href=&quot;http://lists.okfn.org/mailman/listinfo/okfn-labs&quot;&gt;okfn-labs mailing&lt;/a&gt;
list, &lt;a href=&quot;http://lists.okfn.org/pipermail/okfn-labs/2012-September/000376.html&quot;&gt;describing&lt;/a&gt; the various sources he had discovered for
machine-readable excerpts of the EU's joint procurement system, TED.
What struck me about this message was that, apparently, this polite
and brilliant policy wonk had turned into something strange: into a
data catalogue.&lt;/p&gt;

&lt;p&gt;While not quite a Kafka-grade transformation, it's an odd turn to
take for a researcher. But Matej is not the only one: the team of
&lt;a href=&quot;http://farmsubsidy.org/&quot;&gt;FarmSubsidies.org&lt;/a&gt; has experienced a similar re-definition, as did
the ERDF researchers at the &lt;a href=&quot;http://www.thebureauinvestigates.com/&quot;&gt;Bureau of Investigative Journalists&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The best data catalogues today are well-informed people.&lt;/p&gt;

&lt;p&gt;When I talk to journalists about data acquisition, they seem to know
this already: its often not just about where to look; it's even more
important to know who to talk to. But why does this observation from a
telephone-and-filofax world hold true even in digital space, where
every bit of knowledge is supposed to be only a click away?&lt;/p&gt;

&lt;p&gt;I believe that some blame goes to the simplistic model underlying our
efforts to catalogue data: the question of where to find a dataset is
certainly important, but for those actually working with the data it's
just not enough. Once you dig into data, other questions rise to the
foreground:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;How do the different available datasets interact and integrate? Does
the data I am looking for even make sense on its own - or do I need
to combine several sources? Take, for example, the UKs &lt;em&gt;Whole of
Government Accounts&lt;/em&gt;: while data.gov.uk &lt;a href=&quot;http://data.gov.uk/dataset/coins&quot;&gt;lists&lt;/a&gt; a few gigabytes worth of
downloads for this dataset, it is completely impossible to interpret
the data without also fetching Excel files (and PDF guidance) off the
Treasury web site, the Department of Communities and Local Government
site and - bonus points - emailing the Treasury for their internal
toolkit.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;How complete and up-to-date is the data? What technical and political
constraints apply to the publication? Again, FarmSubsidies provide a
nice example, as a 2010 European Court of Justice verdict has severely
limited the availablity of the data - leading to an oddly limited
dataset today.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Who else is working with this data and what are they doing? Are there
derivative datasets that I should use instead of the source material?
It may be worth knowing, for example, that as well as browsing the
6000-odd departmental spending spreadsheets, journalists can also search
across a consolidated version of this data on OpenSpending.org&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;But why are current data portals so bad at capturing such information?
Certainly, adding a few comment boxes and an app gallery can do a good
job glossing over the problem, but the real problems seem to lie deeper in
the technology:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Datasets are a useless unit. A while ago, &lt;a href=&quot;http://richard.cyganiak.de/&quot;&gt;Richard Cyganiak&lt;/a&gt; defined a
dataset as &quot;a set of data&quot; - which I assume is a computer scientists
way of telling you to get lost. And while I'm not normally a big fan
of LOD-clouds, they got this right: all the interesting stuff is
happening in between datasets. Whether it's about reconstructing a
process across several datasets or finding out about geographical and
temporal coverage - datasets are at best building blocks, more often
they are just arbitrary. So maybe its time to think about other
mechanisms to represent data sources: what about policy maps and
government wiring plans?&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Even worse, the metadata we keep about datasets is mostly based on a
bureaucratic mindset: they're library-inspired, static index
cards that hope to represent datasets, while data are really subject
to complex processes both within and outside the institutions that
produce them. For anyone using the data, activity metadata is
the interesting part. We've already figured this out for software,
where libraries like FreshMeat and SourceForge have been replaced by
activity-driven platforms like GitHub. The key aspect here is that
GitHub doesn't require me to explictly make metadata - the relevant
narrative is simply summarized from my working pattern.&lt;/p&gt;

&lt;p&gt;Of course, all of this is just a long way of saying that the best
metadata is in the data itself. So unless you're working on the LHC
stuff there really isn't much of a reason to separate the two any
longer: let's make public, audit-trailed databases that report on
themselves. This, of course, is easier said then done as it implies
that all data will fit into one storage mechanism. In the real
world (i.e. outside Linked Data land), this is unlikely to be true
of structured data any time soon.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;Still, even after fixing our model of how we talk about datasets on the
web, I think we would still find that the best way to ensure that people
collaborate around data is community-building: creating networks that
garden the commons. Perhaps we should start cataloguing those.&lt;/p&gt;
</content>
   <author>
     <name>Friedrich Lindenberg</name>
   </author>
 </entry>
 
 <entry>
   <title>WikipediaJS - accessing Wikipedia article data through Javascript</title>
   <link href="http://okfnlabs.org/blog/2012/09/10/wikipediajs-a-javascript-library-for-accessing-wikipedia-article-information.html"/>
   <updated>2012-09-10T00:00:00-07:00</updated>
   <id>http://okfnlabs.org/blog/2012/09/10/wikipediajs-a-javascript-library-for-accessing-wikipedia-article-information</id>
   <content type="html">&lt;p&gt;&lt;a href=&quot;http://okfnlabs.org/wikipediajs/&quot;&gt;WikipediaJS&lt;/a&gt; is a simple JS library for accessing information in Wikipedia articles such as dates, places, abstracts etc.&lt;/p&gt;

&lt;p&gt;The library is the work of Labs member &lt;a href=&quot;http://rufuspollock.org/&quot;&gt;Rufus
Pollock&lt;/a&gt;. In essence, it is a small wrapper around the data and &lt;a
href=&quot;http://dbpedia.org/sparql/&quot;&gt;APIs&lt;/a&gt; of the &lt;a
href=&quot;http://dbpedia.org/&quot;&gt;DBPedia project&lt;/a&gt; and it is they who have done all
the heavy lifting of extracting structured data from Wikipedia - huge credit
and thanks to DBPedia folks!&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;http://okfnlabs.org/wikipediajs/&quot;&gt;&lt;img src=&quot;http://farm9.staticflickr.com/8029/7961793920_7436dba276_c.jpg&quot; style=&quot;display: block; margin: auto; width: 80%; border: #ccc 5px solid; margin-top: 20px; margin-bottom: 20px;&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;Demo and Examples&lt;/h3&gt;

&lt;p&gt;A demo is included and you can see some examples of the library in action at the following links:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;http://okfnlabs.org/wikipediajs/?url=http://en.wikipedia.org/wiki/Normandy_landings&quot;&gt;http://okfnlabs.org/wikipediajs/?url=http://en.wikipedia.org/wiki/Normandy_landings&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;http://okfnlabs.org/wikipediajs/?url=?url=http://en.wikipedia.org/wiki/Securitas_AB&quot;&gt;http://okfnlabs.org/wikipediajs/?url=?url=http://en.wikipedia.org/wiki/Securitas_AB&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;http://okfnlabs.org/wikipediajs/?url=http://en.wikipedia.org/wiki/Richard_I_of_England&quot;&gt;http://okfnlabs.org/wikipediajs/?url=http://en.wikipedia.org/wiki/Richard_I_of_England&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;http://okfnlabs.org/wikipediajs/?url=http://en.wikipedia.org/wiki/CERN&quot;&gt;http://okfnlabs.org/wikipediajs/?url=http://en.wikipedia.org/wiki/CERN&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;h3&gt;Colophon&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://github.com/okfn/wikipediajs&quot;&gt;WikipediaJS source code is on github&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;One of the reasons for creating WikipediaJS is that we think it can be
useful in &lt;a href=&quot;http://timeliner.reclinejs.com/&quot;&gt;Timeliner&lt;/a&gt; and other apps as a
way to quickly add new items to your timeline.&lt;/p&gt;
</content>
   <author>
     <name>Rufus Pollock</name>
   </author>
 </entry>
 
 <entry>
   <title>Timeliner - Make Nice Timelines Fast</title>
   <link href="http://okfnlabs.org/blog/2012/08/08/timeliner-make-nice-timelines-fast.html"/>
   <updated>2012-08-08T00:00:00-07:00</updated>
   <id>http://okfnlabs.org/blog/2012/08/08/timeliner-make-nice-timelines-fast</id>
   <content type="html">&lt;p&gt;As part of the &lt;a href=&quot;http://reclinejs.com/&quot;&gt;Recline&lt;/a&gt; launch I put together quickly some very simple demo apps one of which was called Timeliner:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;http://timeliner.reclinejs.com/&quot;&gt;http://timeliner.reclinejs.com/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This uses the Recline timeline component (which itself is a relatively thin wrapper around the &lt;em&gt;excellent&lt;/em&gt; &lt;a href=&quot;http://timeline.verite.co/&quot;&gt;Verite timeline&lt;/a&gt;) plus the Recline Google docs backend to provide an easy way for people to make timelines backed by a Google Docs spreadsheet.&lt;/p&gt;

&lt;p&gt;As an example of use, I started work on a &lt;a href=&quot;http://timeliner.reclinejs.com/?backend=gdocs&amp;amp;url=https://docs.google.com/spreadsheet/ccc?key=0Aon3JiuouxLUdDQ3QlJhOHJnS2x0NkxibUp1YnYwR1E%23gid=0#explorer&quot;&gt;&quot;spending stories&quot; timeline about the bankruptcy of US cities (esp in California)&lt;/a&gt; as a result of the &quot;Great Recession&quot; (&lt;a href=&quot;&amp;lt;https://docs.google.com/spreadsheet/ccc?key=0Aon3JiuouxLUdDQ3QlJhOHJnS2x0NkxibUp1YnYwR1E#gid=0&gt;&quot;&gt;source spreadsheet&lt;/a&gt;). I've also created an example &lt;a href=&quot;http://timeliner.reclinejs.com/?backend=gdocs&amp;amp;url=https://docs.google.com/spreadsheet/ccc?key=0Aon3JiuouxLUdDQ3QlJhOHJnS2x0NkxibUp1YnYwR1E%23gid=0#explorer&quot;&gt;timeline of major wars&lt;/a&gt;, a screenshot of which I've inlined:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;http://farm9.staticflickr.com/8285/7508403206_420de3ce5e_b.jpg&quot; style=&quot;width: 600px;; margin: auto; display: block; margin-top: 20px;&quot; /&gt;&lt;/p&gt;

&lt;h3&gt;Code&lt;/h3&gt;

&lt;p&gt;Source code for the Timeliner is here: &lt;a href=&quot;https://github.com/okfn/timeliner&quot;&gt;https://github.com/okfn/timeliner&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you have suggestions for improvements, want to see the ones that already exist, or, &lt;em&gt;gasp&lt;/em&gt;, find a bug please see the issue tracker: &lt;a href=&quot;https://github.com/okfn/timeliner/issues&quot;&gt;https://github.com/okfn/timeliner/issues&lt;/a&gt;&lt;/p&gt;
</content>
   <author>
     <name>Rufus Pollock</name>
   </author>
 </entry>
 
 <entry>
   <title>The Data Transformer - Cleaning Up Data in the Browser</title>
   <link href="http://okfnlabs.org/blog/2012/07/31/data-transformer-cleaning-up-data-in-the-browser.html"/>
   <updated>2012-07-31T00:00:00-07:00</updated>
   <id>http://okfnlabs.org/blog/2012/07/31/data-transformer-cleaning-up-data-in-the-browser</id>
   <content type="html">&lt;p&gt;This a brief post to announce an alpha prototype version of the Data Transformer, an app to let you clean up data in the browser using javascript:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;http://transformer.datahub.io/&quot;&gt;http://transformer.datahub.io/&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;2m overview video:&lt;/h3&gt;

&lt;iframe width=&quot;560&quot; height=&quot;315&quot; src=&quot;http://www.youtube.com/embed/zM1USNaEcVQ&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;1&quot; style=&quot;margin-bottom: 30px;&quot;&gt;&amp;nbsp;&lt;/iframe&gt;


&lt;h3&gt;What does this app do?&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;You load a CSV file from github (fixed at the moment but soon to be customizable)&lt;/li&gt;
&lt;li&gt;You write simple javascript to edit this file (uses ReclineJS transform and grid views + CSV backends -- here's the &lt;a href=&quot;http://reclinejs.com/demos/multiview/?currentView=transform&quot;&gt;original ReclineJS transform demo&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;You save this updated file back to github (via oauth login - this utilizes Michael's great work in Prose!)&lt;/li&gt;
&lt;/ol&gt;


&lt;p&gt;This prototype was hacked together in an afternoon a couple of weeks ago when I was fortunate enough to spend an an afternoon with Michael Aufreiter, Chris Herwig, Mike Morris and others at the Development Seed offices. It builds on ReclineJS + oauth / github connectors borrowed from Prose.&lt;/p&gt;

&lt;p&gt;It's part of an ongoing plan to create a &quot;Data Orchestra&quot; of lightweight data services that can play nicely together with each
other and connect to things like the DataHub (or GitHub ...): &lt;a href=&quot;http://notebook.okfn.org/2012/06/22/datahub-small-pieces-loosely-joined/&quot;&gt;http://notebook.okfn.org/2012/06/22/datahub-small-pieces-loosely-joined/&lt;/a&gt;&lt;/p&gt;
</content>
   <author>
     <name>Rufus Pollock</name>
   </author>
 </entry>
 
 <entry>
   <title>Displaying PyBossa Urban Parks Data on a 3D Globe</title>
   <link href="http://okfnlabs.org/blog/2012/07/14/pybossa-urban-parks-data-on-3d-globe.html"/>
   <updated>2012-07-14T00:00:00-07:00</updated>
   <id>http://okfnlabs.org/blog/2012/07/14/pybossa-urban-parks-data-on-3d-globe</id>
   <content type="html">&lt;p&gt;Labs member &lt;a href=&quot;http://twitter.com/teleyinex&quot;&gt;Daniel Lombraña González&lt;/a&gt; has built a &lt;a href=&quot;http://teleyinex.github.com/pybossa-urbanpark-globe/&quot;&gt;3-d globe showing the locatoins of urban parks around the world&lt;/a&gt; as located by volunteers using the &lt;a href=&quot;http://pybossa.com/app/urbanpark&quot;&gt;Pybossa Urban Park geocoding app&lt;/a&gt;:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;http://teleyinex.github.com/pybossa-urbanpark-globe/&quot;&gt;http://teleyinex.github.com/pybossa-urbanpark-globe/&lt;/a&gt;&lt;/strong&gt; &amp;mdash; (&lt;a href=&quot;https://github.com/teleyinex/pybossa-urbanpark-globe&quot;&gt;Source code&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://p.twimg.com/AxxDoY9CIAET_0L.png:large&quot; alt=&quot;screenshot&quot; /&gt;&lt;/p&gt;

&lt;h3&gt;Background&lt;/h3&gt;

&lt;p&gt;The Urban Parks geo-coding application is a micro-tasking app running on &lt;a href=&quot;http://pybossa.com&quot;&gt;PyBossa&lt;/a&gt;. In the app volunteers are asked to find an urban park for cities around the world. The volunteers use a web map to browse the city, and then submit an answer: the coordinates of the urban park by placing a marker in the map, or saying: I don't find any park.&lt;/p&gt;

&lt;p&gt;More details about PyBossa can be found on the official site &lt;a href=&quot;http://pybossa.com&quot;&gt;http://pybossa.com&lt;/a&gt; and also in the &lt;a href=&quot;docs.pybossa.com&quot;&gt;online documentation&lt;/a&gt;.&lt;/p&gt;
</content>
   <author>
     <name>Daniel Lombraña Gonzalez</name>
   </author>
 </entry>
 
 <entry>
   <title>dataissues.org - public issue tracking for data defects</title>
   <link href="http://okfnlabs.org/blog/2012/07/10/dataissues.html"/>
   <updated>2012-07-10T00:00:00-07:00</updated>
   <id>http://okfnlabs.org/blog/2012/07/10/dataissues</id>
   <content type="html">&lt;p&gt;&lt;em&gt;On June 21st, the Knight News Challenge Round on Data ended. The day before,
&lt;a href=&quot;http://rufuspollock.org/&quot;&gt;Rufus&lt;/a&gt;, &lt;a href=&quot;https://twitter.com/rossjones&quot;&gt;Ross&lt;/a&gt; and
I sat down to write out some ideas that we'd been discussing for a while. While
we submitted proposals for &lt;a href=&quot;/2012/07/09/grano.html&quot;&gt;Grano&lt;/a&gt; and &lt;a href=&quot;http://newschallenge.tumblr.com/post/25576949597/data-protocols-rough-consensus-running-code-and&quot;&gt;DataProtocols&lt;/a&gt;, we decided to hold back on this idea for another round. Still, sharing is caring.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. What do you propose to do? [20 words]&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We’ll create a web service where data wranglers and consumers can log errors arising from processing, viewing or using data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. How will your project make data more useful? [50 words]&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;All data has errors. While data quality is often talked about, the best practice for data apps is often to have half a paragraph on the ‘about’ page. We want to build a service that is useful to data wranglers, but can also serve as documentation for end-users and basis for further discussion.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. How is your project different from what already exists? [30 words]&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Error reporting for software is either done as task tickets (e.g. github.com) or by capturing raw application output (e.g. exceptional.io). For data, we want to combine these two approaches to let users group recurring errors into issues that can then be discussed and fixed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Why will it work? [100 words]&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;While all data processing workflows are different from dataset to dataset, the types of errors that occur are often quite similar and can be stored in a shared service. This is both immediately useful when doing data work - especially scheduled, unsupervised processes - but also as an activity log for other people to see.&lt;/p&gt;

&lt;p&gt;We’ll create both an easy-to-use online validation tool to check spreadsheets against a certain schema and an API with client libraries that can be integrated into existing processing pipelines. The reported issues can be full-out errors, but also probes that highlight implausible values.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Who is working on it? [100 words]&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The Open Knowledge Foundation is...&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;6. What part of the project have you already built? [100 words]&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We’ve got extensive experience working with dataset metadata from DataHub.io and produced a number of complex data processing pipelines (e.g. for UK spending data, that merges over 5000 spreadsheets in different formats). These clearly show the need for better reporting, and we have built several ad-hoc solutions but know that is a major area that is inadequately addressed in our work and those of others. We have already got a basic prototype and can build a first increment quickly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;7. How would you use News Challenge funds? [50 words]&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We’ll built it! We’ll develop a full version of this service iteratively, test and promote it. We plan to work together with civic data projects as early adopters to get quick feedback and adapt the service to suit their needs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;8. How would you sustain the project after the funding expires? [50 words]&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This will be perfectly suited to SaaS freemium model in which heavy and/or professional users who need to report large amounts of errors and generate complex reports pay a subscription fee. In addition as open-source software the project can be re-used and extended by others.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If you think this is a good idea, &lt;a href=&quot;http://github.com/okfn/dataissues&quot;&gt;help hacking and contribute patches to the dataissues repository&lt;/a&gt;!&lt;/strong&gt;&lt;/p&gt;
</content>
   <author>
     <name>Friedrich Lindenberg</name>
   </author>
 </entry>
 
 <entry>
   <title>Grano - social network analysis for advocates and journalists.</title>
   <link href="http://okfnlabs.org/blog/2012/07/09/grano.html"/>
   <updated>2012-07-09T00:00:00-07:00</updated>
   <id>http://okfnlabs.org/blog/2012/07/09/grano</id>
   <content type="html">&lt;p&gt;&lt;em&gt;On June 21st, the Knight News Challenge Round on Data ended. The day before,
&lt;a href=&quot;http://rufuspollock.org/&quot;&gt;Rufus&lt;/a&gt;, &lt;a href=&quot;https://twitter.com/rossjones&quot;&gt;Ross&lt;/a&gt; and
I sat down to write out some ideas that we'd been discussing for a while. The
first idea I want to repost here is a proposal for Grano, which I've &lt;a href=&quot;http://pudo.org/2011/12/19/sna.html&quot;&gt;discussed
in this blog before&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. What do you propose to do? [20 words]&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We’ll make a powerful tool for journalists and advocates to keep track of actors and their relationships in complex environments.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;http://pudo.org/images/grano.png&quot; alt=&quot;Grano&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. How will your project make data more useful? [50 words]&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It’ll enable users to manage research in a structured way, helping them to link raw data to the actors, events and organisations they’re already investigating and to find those that they may have missed before. We’ll help users do their job more thoroughly, while creating a structure that can be re-used later.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. How is your project different from what already exists? [30 words]&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Network analysis means many things to people: it’s graph algorithms to coders, network diagrams to designers and CRM to business. Journalists and advocates need evidence gathering and information linkage to be at the core of these things.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Why will it work? [100 words]&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We want to focus on four functions that will make this a practical tool instead of a gimmick:&lt;/p&gt;

&lt;p&gt;a) allowing users to easily integrate bulk data to complement manually entered information,&lt;/p&gt;

&lt;p&gt;b) helping them to keep track of the source for each fact that is entered and keeping a full version history,&lt;/p&gt;

&lt;p&gt;c) providing easy access control so that users can choose which information to keep private and which links to publish with others and&lt;/p&gt;

&lt;p&gt;d) text snippets, so that researchers can combine structured analysis and narrative fragments in which the tool will detect references to the network’s entities.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Who is working on it? [100 words]&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The Open Knowledge Foundation wants to cooperate with investigative networks around the world to develop this project. We’ve already been pioneering data collection and presentation tools, such as DataHub.io and OpenSpending, as well as efforts like the Data Journalism Handbook and the School of Data to widen data literacy.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Friedrich Lindenberg (OKFN) has worked on several data projects and data-journalism training and will lead this project.&lt;/li&gt;
&lt;li&gt;Ross Jones (OKFN) will contribute as a software architect.&lt;/li&gt;
&lt;li&gt;Stefan Candea (2011 Nieman Fellow at Harvard, Director of the Romanian Center for Investigative Journalism) has offered to advise us.&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;&lt;strong&gt;6. What part of the project have you already built? [100 words]&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We’ve already built Grano, a REST backend that can store network information, generate custom reports about nodes and relations and run full-text search. Because we think that meaningful network analysis is hard, we are conservative in the choice of technology to focus on outcomes. To force that, we decided to base our tool on a concrete use case. The software is now first used in an unannounced project that tracks lobbying in the EU, powering a special-purpose, JavaScript-only site. Unfortunately, this means the current prototype does not have a stand-alone web interface and the serious data integration capabilities we think it needs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;7. How would you use News Challenge funds? [50 words]&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We want to develop Grano to give investigative journalists and civic hackers a (re-usable) web interface to design their network structure, manually enter data, integrate bulk data sets and to explore the resulting network, make notes, calculate key metrics and to export reports, rankings and network visualizations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;8. How would you sustain the project after the funding expires? [50 words]&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;While the service is going to be of immediate use, we believe that advocacy groups and newsrooms will also deploy it as a backend to their features and campaign sites. We aim to make Grano into a thriving open source project, supported through custom services for power users.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If you like this idea, please vote for it on the &lt;a href=&quot;http://newschallenge.tumblr.com/post/25572174408/grano&quot;&gt;proposal page&lt;/a&gt;.&lt;/strong&gt;&lt;/p&gt;
</content>
   <author>
     <name>Friedrich Lindenberg</name>
   </author>
 </entry>
 
 
</feed>
