Clustering, Classifying, and Working with Weka - Clojure Data Analysis

Database Reference

In-Depth Information

2.

Now we can get the indexes of the data rows that are in each cluster by looking at

the :sets key of iris-clusters . Then we can pull the species from each cluster's

rows to look at the frequency of each species in each cluster:

user=> (doseq [[pos rws] (:sets iris-clusters)]

(println pos \:

(frequencies

(i/sel iris :cols :Species

:rows rws))))

[4 1] : {virginica 23}

[8 1] : {virginica 27, versicolor 50}

[9 0] : {setosa 50}

So we can see that setosa and versicolor are each put into their own clusters, and half of

the virginica are in their own cluster and half are with the versicolors.

How it works…

SOMs use a neural network to map data points onto a grid. As the neural network is trained,

the data points converge into cells in the grid, based on the similarities between the items.

We can get the size of the output map using the :dims key:

user=> (:dims iris-clusters)

[10.0 2.0]

We can use this information, combined with the cell frequencies, to graph the clustering of

data in the SOM:

50

0

50

0

50

0

Virginica

Setosa

Versicolor

One of the downsides of SOMs is that the network's weights are largely opaque. We can

see the groupings, but iguring out why the algorithm grouped them the way it did is dificult

to deine.

Search WWH ::

Custom Search

Home