Databases Reference
In-Depth Information
The two adjacent clusters should now be represented in the parallel plot. How
do they compare?
In the parallel plot, check the “Show Means” box.
The smaller of the two clusters represents overall smaller flowers. Because
the mean plot lines of the two clusters are roughly parallel, the ratios between
measures within each of the clusters appear to be similar.
As a matter of practice, when comparing clusters in a synchronized parallel
plot, it is suggested that the “Show Means” box be checked.
Choosing the Grid Dimensions
The choice of grid dimensions when building a SOM clustering depends on the
objective of that clustering. Smaller grids with fewer total cells force what
would normally be adjacent cells into the same cell. Hence, if the objective is to
isolate a few large clusters that can be individually studied using additional data
mining methodologies, choose a small grid size.
Consider the cluster analysis just completed for the iris dataset with the
3 3 3 grid. There were 15 total clusters generated. Six clusters contained
three or fewer observations.
Redo the Iris clustering, choosing a grid size of 2 2 1.
Open Iris.csv in a parallel plot synchronized with the SOM. Use the
parallel plot to evaluate each of the three clusters generated.
Compare statistics between the two clusterings. (See Table 7.1, rows one
and three.)
Again redo the Iris clustering choosing a grid size of 2 2 2. Evaluate
and compare.
Table 7.1
Grid Statistics
Grid
Dimensions
No.
Clusters
No. Small Clusters
( < 4 obs)
MSE
Correlation
2 2 1
3
0
0.1060
0.704
2 2 2
5
1
0.0872
0.648
3 3 3
15
6
0.0423
0.516
 
Search WWH ::




Custom Search