Clustering, Classifying, and Working with Weka - Clojure Data Analysis

Database Reference

In-Depth Information

How to do it…

In this recipe, we'll irst rename the columns from the dataset. Then we'll look at two different

ways to remove columns, one destructively and one not.

Renaming columns

We'll create a function to rename the attributes with a sequence of keywords, and then we'll

see this function in action:

1. First, we'll deine a function that takes a dataset and a sequence of ield names,

and then renames the columns in the dataset to match those passed in:

(defn set-fields [instances field-seq]

(doseq [n (range (.numAttributes instances))]

(.renameAttribute instances

(.attribute instances n)

(name (nth field-seq n)))))

2.

Now, let's look at the dataset's current column names:

user=> (map #(.. data (attribute %) name)

(range (.numAttributes data)))

("Country-Code" "Year" "AG.SRF.TOTL.K2" "AG.LND.AGRI.ZS" "AG.LND.

AGRI.K2")

3. These are the names that World Bank gives these ields, but we can change the ield

names to something more obvious:

(set-fields data

[:country-code :year

:total-land :agri-percent :agri-total])

Removing columns

This dataset also contains a number of columns that we won't use, for example, the ield

agri-percent . Since it won't ever be used, we'll destructively remove it from the dataset:

1.

Weka allows us to delete attributes by index, but we want to specify them by name.

We'll write a function that takes an attribute name and returns the index:

(defn attr-n [instances attr-name]

(->> instances

(.numAttributes)

range

(map #(vector % (.. instances (attribute %)

name)))

(filter #(= (second %) (name attr-name)))

ffirst))

Search WWH ::

Custom Search

Home