Visualizing Relationships - Data Points: Visualization That Means Something

Graphics Programs Reference

In-Depth Information

To find the outlier, use summary() again on birth_yearly .

year rate

Min. :1960 Min. : 6.90

1st Qu.:1973 1st Qu.: 18.06

Median :1986 Median : 29.61

Mean :1985 Mean : 29.94

3rd Qu.:1997 3rd Qu.: 41.91

Max. :2008 Max. :132.00

The maximum rate is 132. That seems off. No other rates even come close

to 100. What's going on? It turns out that the rate recorded for Palau in

1999 is 132. This is most likely a typo because the rates for Palau before

and after 1999 are no greater than 20. It's probably supposed to be 13.2,

but you'd have to look into that deeper. For now, temporarily remove that

mistake.

birth_yearly.new <- birth_yearly[birth_yearly$rate < 132,]

On to the labels for the years. When the values used for labels are stored

as numeric, the lattice function automatically uses the orange bar to

indicate value. If, however, the labels are characters, the function uses

strings, so now do that.

birth_yearly.new$year <- as.character(birth_yearly.new$year)

You still need to update the order, but create the histogram matrix first

and store it in a variable.

h <- histogram(~ rate | year, data=birth_yearly.new, layout=c(10,5))

Now use the update() function to change the order of the histograms.

update(h, index.cond=list(c(41:50, 31:40, 21:30, 11:20, 1:10)))

This basically reverses the order of all the rows. As shown in Figure 6-34,

you get a nicely labeled histogram matrix—and a better sense of the distri-

butions after removing the typo. Plus the histograms are arranged more

logically so that you can read from left to right, top to bottom. Read just

one cell from each row, and move your eyes top to bottom so that you can

see how the distribution has changed by decade.

Search WWH ::

Custom Search

Home