We make tools and insights using
open data, open content and open code
Join in »

The Data Wrangling Blog

  • 29 November 2017 Vitor Baptista

    Validating scraped data using goodtables

    We have to deal with many challenges when scraping a page. What’s the page’s layout? How do I extract the bits of data I want? How do I know when their layout changes and break my code? How can I be sure that my code...
  • 03 November 2017 DataHub Team

    Core Data on DataHub.io

    This blog post was originally published on datahub.io by Rufus Pollock, Meiran Zhiyenbayev & Anuar Ustayev. The “Core Data” project provides essential data for the data wranglers and data science community. Its online home is on the DataHub: https://datahub.io/core https://datahub.io/docs/core-data This post introduces you to...
  • This post walks you through the major changes in the Data Package v1 specs compared to pre-v1. It covers changes in the full suite of Data Package specifications including Data Resources and Table Schema. It is particularly valuable if: you were using Data Packages pre...
  • 05 October 2017 Serah Rono

    Frictionless Data Specs v1 Updates

    The Frictionless Data team released v1 specifications in the first week of September 2017 and Paul Walsh, Chief Product Officer at Open Knowledge International, wrote a detailed blogpost about it. With this milestone, in addition to modifications on pre-existing specifications like Table Schema1 and CSV...
  • 13 July 2017 Dan Fowler

    Measure for Measure

    In his Open Knowledge International Tech Talk, Developer Brook Elgie describes how we are using Data Package Pipelines and Redash to gain insight into our organization in a declarative, reproducible, and easy to modify way. This post briefly introduces a newly launched internal project at...
  • All blog posts…