Database Reference
In-Depth Information
How to do it…
Now we see some return on having deined the defanalysis macro. We can create a
wrapper function for Weka's HierarchicalClusterer in just a few lines:
1. We deine the wrapper function like this:
(defanalysis
hierarchical HierarchicalClusterer buildClusterer
[["-A" distance EuclideanDistance .getName]
["-L" link-type :centroid
#(str/upper-case (str/replace (name %) \- \_))]
["-N" k nil :not-nil]
["-D" verbose false :flag-true]
["-B" distance-of :node-length :flag-equal
:branch-length]
["-P" print-newick false :flag-true]])
2. Using this, we can ilter the petal dimensions ields and perform the analysis:
(def iris-petal
(filter-attributes
iris
[:sepallength :sepalwidth :class]))
3. Now we can use this data to train a new classiier:
(def hc
(hierarchical iris-petal :k 3 :print-newick true))
4.
To see which cluster an instance falls in, we use clusterInstance , and we can
check the same index in the full dataset to see all of the attributes for that instance:
user=> (.clusterInstance hc (.get iris-petal 2))
0
user=> (.get iris 2)
#<DenseInstance 4.7,3.2,1.3,0.2,Iris-setosa>
How it works…
Hierarchical clustering usually works in a bottom-up manner. Its process is fairly simple:
1.
Identify the two data points or clusters that are closest to each other
2.
Group them into a new cluster positioned at the centroid of the pair
3.
In the population, replace the pair with the new group
4.
Repeat until we're left with only the number of clusters we expect (the "-N" option
mentioned previously)
 
Search WWH ::




Custom Search