Databases Reference
In-Depth Information
Comparing the clusterings using MSE, a measure of cohesion only, the
3 3 3 clustering would be considered best. Keep in mind, however, that
smaller clusters tend to have lower MSEs. Comparing the clusterings using
the coefficient of correlation, a measure of both cohesion and separation, the
2 2 1 clustering comes out on top.
If the objective of the clustering is to gain a better understanding of the data,
all clusterings provide some insight. In the 3 3 3 clustering, the separation
of the Setosa cluster indicates that these flowers are quite distinct from the
others. The numerous adjacent clusters near the opposite corner reflect a
difficulty in categorizing both the Versicolor and Virginica flowers based on
petal and sepal size measurements. The larger merged cells in the smaller
clusterings again reflect the difficulty in distinguishing between Versicolor and
Virginica.
Advantages of a 3-D Grid
To illustrate the advantages of a 3-D grid, let's evaluate the voting records of US
congressional representatives. In 2004, the US House voted on 50 proposed acts.
The dataset SelectedVotes.csv contains the voting records for representatives on
the acts that passed with less than a 90% majority. In other words, those votes
in which almost all representatives were in agreement have been removed. There
is one observation per representative. For each act, the name of the column is
the name of the act. A “yes” vote is recorded as 1 and a “no” vote as 0. If the
representative did not vote on the act in question, it is assigned a value of 0.5. In
addition to the voting record, the dataset also contains RepName, State, and Party.
Open the SelectedVotes.csv dataset.
Create a derived set named “votes” with the vote columns only. That is,
exclude the Party, RepName, and State columns.
Create a 3
3
3 clustering of “votes”.
Open the resulting model in the SOM viewer.
Open a tabular presentation of SelectedVotes.csv that is synchronized with
the “votes” SOM.
The SOM cube (Figure 7.10) contains two large clusters on opposite corners.
You can probably guess which groups these clusters represent. They are those
that consistently voted the party line. The largest cluster contains only Repub-
licans, while the other contains only Democrats.
 
Search WWH ::




Custom Search