Matt Thompson was one of 2017’s Frictionless Data Tool Fund grantees tasked with extending implementation of core Frictionless Data data package and table schema libraries in Clojure programming language. You can read more about this in his grantee profile. In this post, Thompson will show you how to set up and use the Clojure libraries for working with Tabular Data Packages.
This tutorial uses a worked example of downloading a data package from a remote location on the web, and using the Frictionless Data tools to read its contents and metadata into Clojure data structures.
Setup
First, we need to set up the project structure using the Leiningen tool. If you don’t have Leiningen set up on your system, follow the link to download and install it. Once it is set up, run the following command from the command line to create the folders and files for a basic Clojure project:
This will create the periodic-table folder. Inside the periodic-table/src/periodic-table folder should be a file named core.clj. This is the file you need to edit during this tutorial.
The Data
For this tutorial, we will use a pre-created data package, the Periodic Table Data Package hosted by the Frictionless Data project. A Data Package is a simple container format used to describe and package a collection of data. It consists of two parts:
- Metadata that describes the structure and contents of the package
- Resources such as data files that form the contents of the package
Our Clojure code will download the data package and process it using the metadata information contained in the package. The data package can be found here on GitHub.
The data package contains data about elements in the periodic table, including each element’s name, atomic number, symbol and atomic weight. The table below shows a sample taken from the first three rows of the CSV file:
atomic number | symbol | name | atomic mass | metal or nonmetal? |
---|---|---|---|---|
1 | H | Hydrogen | 1.00794 | nonmetal |
2 | He | Helium | 4.002602 | noble gas |
3 | Li | Lithium | 6.941 | alkali metal |
Loading the Data Package
The first step is to load the data package into a Clojure data structure (a map). The initial step is to require the data package library in our code (which we will give the alias dp). Then we can use the load function to load our data package into our project. Enter the following code into the core.clj file:
This pulls the data in from the remote GitHub location and converts the metadata into a Clojure map. We can access this metadata by using the descriptor
function along with keys such as :name
and :title
to get the relevant information:
The package descriptor contains metadata that describes the contents of the data package. What about accessing the data itself? We can get to it using the get-resources
function:
The above code locates the data in the data package, then goes through it line by line and prints the contents.
Casting Types with core.spec
We can use Clojure’s spec library to define a schema for our data, which can then be used to cast the types of the data in the CSV file.
Below is a spec description of a periodic element type, consisting of an atomic number, atomic symbol, the element’s name, its mass, and whether or not the element is a metal or non-metal:
The above spec can be used to cast values in our tabular data so that they match the specified schema. The example below shows our tabular data values being cast to fit the spec description. Then the -main
function loops through the elements, printing only those with an atomic mass of over 10.
When run, the program produces the following output:
This concludes our simple tutorial for using the Clojure libraries for Frictionless Data.
We welcome your feedback and questions via our Frictionless Data Gitter chat or via GitHub issues on the datapackage-clj repository.
Comments