Database Reference
In-Depth Information
How to do it…
In this recipe, we'll irst rename the columns from the dataset. Then we'll look at two different
ways to remove columns, one destructively and one not.
Renaming columns
We'll create a function to rename the attributes with a sequence of keywords, and then we'll
see this function in action:
1. First, we'll deine a function that takes a dataset and a sequence of ield names,
and then renames the columns in the dataset to match those passed in:
(defn set-fields [instances field-seq]
(doseq [n (range (.numAttributes instances))]
(.renameAttribute instances
(.attribute instances n)
(name (nth field-seq n)))))
2.
Now, let's look at the dataset's current column names:
user=> (map #(.. data (attribute %) name)
(range (.numAttributes data)))
("Country-Code" "Year" "AG.SRF.TOTL.K2" "AG.LND.AGRI.ZS" "AG.LND.
AGRI.K2")
3. These are the names that World Bank gives these ields, but we can change the ield
names to something more obvious:
(set-fields data
[:country-code :year
:total-land :agri-percent :agri-total])
Removing columns
This dataset also contains a number of columns that we won't use, for example, the ield
agri-percent . Since it won't ever be used, we'll destructively remove it from the dataset:
1.
Weka allows us to delete attributes by index, but we want to specify them by name.
We'll write a function that takes an attribute name and returns the index:
(defn attr-n [instances attr-name]
(->> instances
(.numAttributes)
range
(map #(vector % (.. instances (attribute %)
name)))
(filter #(= (second %) (name attr-name)))
ffirst))
 
Search WWH ::




Custom Search