We make tools and insights using
open data, open content and open code
Join in »

The Data Wrangling Blog

  • datapackage-pipelines is a framework for defining data processing steps to generate self-describing Data Packages, built on the concepts and tooling of the Frictionless Data project. You can read more about datapackage-pipelines in this introductory post. Data wrangling can be quite a tedious task - We...
  • When it comes to tabular data, the Frictionless Data specifications provide users with strong conventions for declaring both the shape of data (via schemas) and information about the data (as metadata on package and resource descriptors). Within the Frictionless Data world, we purposefully refer to...
  • 29 November 2017 Vitor Baptista

    Validating scraped data using goodtables

    We have to deal with many challenges when scraping a page. What’s the page’s layout? How do I extract the bits of data I want? How do I know when their layout changes and break my code? How can I be sure that my code...
  • 03 November 2017 DataHub Team

    Core Data on DataHub.io

    This blog post was originally published on datahub.io by Rufus Pollock, Meiran Zhiyenbayev & Anuar Ustayev. The “Core Data” project provides essential data for the data wranglers and data science community. Its online home is on the DataHub: https://datahub.io/core https://datahub.io/docs/core-data This post introduces you to...
  • This post walks you through the major changes in the Data Package v1 specs compared to pre-v1. It covers changes in the full suite of Data Package specifications including Data Resources and Table Schema. It is particularly valuable if: you were using Data Packages pre...
  • All blog posts…