Clustering, Classifying, and Working with Weka - Clojure Data Analysis

Database Reference

In-Depth Information

How to do it…

Now we see some return on having deined the defanalysis macro. We can create a

wrapper function for Weka's HierarchicalClusterer in just a few lines:

1. We deine the wrapper function like this:

(defanalysis

hierarchical HierarchicalClusterer buildClusterer

[["-A" distance EuclideanDistance .getName]

["-L" link-type :centroid

#(str/upper-case (str/replace (name %) \- \_))]

["-N" k nil :not-nil]

["-D" verbose false :flag-true]

["-B" distance-of :node-length :flag-equal

:branch-length]

["-P" print-newick false :flag-true]])

2. Using this, we can ilter the petal dimensions ields and perform the analysis:

(def iris-petal

(filter-attributes

iris

[:sepallength :sepalwidth :class]))

3. Now we can use this data to train a new classiier:

(def hc

(hierarchical iris-petal :k 3 :print-newick true))

4.

To see which cluster an instance falls in, we use clusterInstance , and we can

check the same index in the full dataset to see all of the attributes for that instance:

user=> (.clusterInstance hc (.get iris-petal 2))

0

user=> (.get iris 2)

#<DenseInstance 4.7,3.2,1.3,0.2,Iris-setosa>

How it works…

Hierarchical clustering usually works in a bottom-up manner. Its process is fairly simple:

1.

Identify the two data points or clusters that are closest to each other

2.

Group them into a new cluster positioned at the centroid of the pair

3.

In the population, replace the pair with the new group

4.

Repeat until we're left with only the number of clusters we expect (the "-N" option

mentioned previously)

Search WWH ::

Custom Search

Home