Database Reference
In-Depth Information
We'll use these statements for inclusions:
(require '[clojure.java.io :as io]
'[clojure.data.csv :as csv]
'[clojure.string :as str]
'[incanter.core :as i])
For our data ile, we'll use the same data that we introduced in the Selecting columns
with $ recipe: China's development dataset from the World Bank.
How to do it…
In this recipe, we'll take a look at how to join two datasets using Incanter:
1.
To begin with, we'll load the data from the data/chn/chn_Country_en_csv_
v2.csv ile. We'll use the with-header and read-country-data functions that
were deined in the Selecting columns with $ recipe:
(def data-file "data/chn/chn_Country_en_csv_v2.csv")
(def chn-data (read-country-data data-file))
2.
Currently, the data for each row contains the data for one indicator across many
years. However, for some analyses, it will be more helpful to have each row contain
the data for one indicator for one year. To do this, let's irst pull out the data from
2 years into separate datasets. Note that for the second dataset, we'll only include
a column to match the irst dataset ( :Indicator-Code ) and the data column
( :2000 ):
(def chn-1990
(i/$ [:Indicator-Code :Indicator-Name :1990]
chn-data))
(def chn-2000
(i/$ [:Indicator-Code :2000] chn-data))
3.
Now, we'll join these datasets back together. This is contrived, but it's easy to see how
we will do this in a more meaningful example. For example, we might want to
join the datasets from two different countries:
(def chn-decade
(i/$join [:Indicator-Code :Indicator-Code]
chn-1990 chn-2000))
From this point on, we can use chn-decade just as we use any other Incanter dataset.
 
Search WWH ::




Custom Search