Graphics Reference
In-Depth Information
Creating a Dendrogram
Problem
You want to make a dendrogram to show how items are clustered.
Solution
Use hclust() and plot the output from it. This can require a fair bit of data preprocessing. For
this example, we'll first take a subset of the countries data set from the year 2009. For simpli-
city, we'll also drop all rows that contain an NA , and then select a random 25 of the remaining
rows:
library(gcookbook) # For the data set
# Get data from year 2009
c2 <- subset(countries, Year == 2009 )
# Drop rows that have any NA values
c2 <- c2[complete.cases(c2), ]
# Pick out a random 25 countries
# (Set random seed to make this repeatable)
set.seed( 201 )
c2 <- c2[sample( 1 :nrow(c2), 25 ), ]
c2
Name Code Year GDP laborrate healthexp infmortality
6731
Mongolia MNG 2009 1690.4170
72.9
74.19826
27.8
1733
Canada CAN 2009 39599.0418
67.8 4379.76084
5.2
...
5966
Macedonia, FYR MKD 2009 4510.2380
54.0 313.68971
10.6
10148
Turkmenistan TKM 2009 3710.4536
68.0
77.06955
48.0
Notice that the row names (the first column) are essentially random numbers, since the rows
were selected randomly. We need to do a few more things to the data before making a dendro-
gram from it. First, we need to set the rownames—right now there's a column called Name , but
the row names are those random numbers (we don't often use row names, but for the hclust()
function they're essential). Next, we'll need to drop all the columns that aren't values used for
clustering. These columns are Name , Code , and Year :
rownames(c2) <- c2$Name
c2 <- c2[, 4 : 7 ]
Search WWH ::




Custom Search