Graphics Programs Reference
In-Depth Information
Make Use of Multidimensional Scaling
Multidimensional scaling is much easier to understand with a concrete
example, so jump right in. Come back to the education data, so if you
haven't loaded it in R already, go ahead and do that first.
education <-
read.csv(“http://datasets.flowingdata.com/education.csv”,
header=TRUE)
Remember, there is a row for each state, which includes the District of
Columbia and more rows for the United States averages. There are six
variables for each state: reading, math, and writing SAT scores; percent-
age of graduates who took the SAT; pupil-to-staff ratio; and dropout rate.
It's just like the room metaphor, but instead of a square room, it's a square
plot; instead of people, there are states; and instead of height and weight,
you have education-related metrics. The goal is the same. You want to
place the states on an x-y plot, so that similar states are closer together.
First step: Figure out how far each state should be from every other state. Use
the dist() function, which does just that. You use only columns 2 through 7
because the first column is state names, and you know all those are different.
ed.dis <- dist(education[,2:7])
If you type ed.dis in the console, you see a series of matrices. each cell
represents how far one state should be from another (by euclidean pixel
distance). For example, the value in the second row, second column over
is the distance Alabama should be from Alaska. The units aren't so impor-
tant at this point. Rather it's the relative differences that matter.
How do you plot this 51 by 51 matrix on an x-y plot? You can't yet, until you
have an x-y coordinate for each state. That's what cmdscale() is for. It takes
a distance matrix as input and returns a set of points so that the differ-
ences between those points are about the same as specified in the matrix.
ed.mds <- cmdscale(ed.dis)
Type ed.mds in the console, and you see you now have x-y coordinates for
each row of data. Store these in the variables x and y , and toss them into
plot() to see what it looks like (Figure 7-27).
x <- ed.mds[,1]
y <- ed.mds[,2]
plot(x,y)
Search WWH ::




Custom Search