Graphics Reference
In-Depth Information
Figure . shows a dendrogram for the dentitio data created using an agglomer-
ative algorithm utilizing Manhattan distance and the average linkage method. Man-
hattan distance was chosen because it has a direct interpretation for these data: it
equals the difference between the teeth-group counts for two given species.
First we look for animals separated by the minimal distance. Obviously the mini-
mum possible difference is zero, i.e., animals with identical teeth configurations like
the two bats shown in the last two rows of Table . . Animals separated by zero dis-
tance are depicted by vertical lines immediately to the let of their names, as shown
for mink, weasel, ferret, badger and skunk at the top of the graph. Ater all of the an-
imals separated by zero distance have been found and connected, those groups that
arethenextsmallestaveragedistanceapartarejoined.Inourcase,thenextsmallest
distance possible isa differenceof one tooth, as seen forthe pygmybat and the house
bat (also shown in Table . ).
Notethat theactual layoutofadendrogramisnotunique,because ateachbranch-
ing point the top and bottom branches could be exchanged. For example, the ar-
madillo, which represents an outlier here because it has eight molars and no other
types of teeth, could have equally well been placed at the top of the graph. N
branchingpointsareneeded toconnectall N data points,sothetotal numberofden-
drograms that could be drawn for exactly the same clustering is N . his is much
smaller than all possible permutations (N!), but is still quite a large number. here-
fore, many sotware packages that perform hierarchical clustering allow the user to
rearrange the observations, either manually or by specifying an ordering function.
Heatmaps
11.2.2
Of course, we can cluster the variables as well as the observations. For example, we
mightbeinterestedinwhethertheanimalsdiffermoreintermsoftypeortop/bottom
jaw. At the top of Fig. . is a dendrogram for the variables sorted as they appear
in the data set, i.e., with the top and bottom of each type next to each other. he
original sorting of the data is compatible with hierarchical clustering of the variables
(as depicted bythe dendrogram), because there are no crossing lines in the tree. his
leads to the (rather obvious) conclusion that the variables for the same type of teeth
on the top and bottom jaws are very similar.
Figure . is a so-called cluster heatmap. he main part is an image plot of the
original data, whereeachcellinthematrix correspondstoavalue intheoriginal data
set. Columns and rows are permuted to conform with the hierarchical clustering of
variables andobservations; thecorrespondingdendrogramsareplacedtotheletand
on top of the matrix, respectively.
Manyimportantfeaturesofthisdatasetcanbeeasilypickedoutusingtheheatmap
representation. he strongest patterns are the four “vertical stripes” for each of the
four types of teeth, because many animals have the same (or very similar) counts on
the top and bottom jaws. We can also see that the number of canines in general is
rather low, while the other three tooth types show “blocks” of animals with either
high or low counts. For example, the predators in the upper rows have larger in-
cisor and premolar counts, while the rodents in the bottom rows have more molars.
Search WWH ::




Custom Search