Working with Incanter Datasets - Clojure Data Analysis

Database Reference

In-Depth Information

We'll use these statements for inclusions:

(require '[clojure.java.io :as io]

'[clojure.data.csv :as csv]

'[clojure.string :as str]

'[incanter.core :as i])

For our data ile, we'll use the same data that we introduced in the Selecting columns

with $ recipe: China's development dataset from the World Bank.

How to do it…

In this recipe, we'll take a look at how to join two datasets using Incanter:

1.

To begin with, we'll load the data from the data/chn/chn_Country_en_csv_

v2.csv ile. We'll use the with-header and read-country-data functions that

were deined in the Selecting columns with $ recipe:

(def data-file "data/chn/chn_Country_en_csv_v2.csv")

(def chn-data (read-country-data data-file))

2.

Currently, the data for each row contains the data for one indicator across many

years. However, for some analyses, it will be more helpful to have each row contain

the data for one indicator for one year. To do this, let's irst pull out the data from

2 years into separate datasets. Note that for the second dataset, we'll only include

a column to match the irst dataset ( :Indicator-Code ) and the data column

( :2000 ):

(def chn-1990

(i/$ [:Indicator-Code :Indicator-Name :1990]

chn-data))

(def chn-2000

(i/$ [:Indicator-Code :2000] chn-data))

3.

Now, we'll join these datasets back together. This is contrived, but it's easy to see how

we will do this in a more meaningful example. For example, we might want to

join the datasets from two different countries:

(def chn-decade

(i/$join [:Indicator-Code :Indicator-Code]

chn-1990 chn-2000))

From this point on, we can use chn-decade just as we use any other Incanter dataset.

Search WWH ::

Custom Search

Home