Graphics Programs Reference
In-Depth Information
Figure 6-28 shows the use of a curve instead of bars. The total area under
the curve is equal to one, and the vertical axis represents the probability
or a proportion of a value in the sample population.
In this example,
missing values
are removed
for the sake of
simplicity. When
you visualize and
explore your own
data, you should
look more closely
at missing values.
Why are the
values missing?
Should they be set
to zero or removed
completely?
CrEATE A DENSITy PLoT
Returning to the birth rates data, you need to take one extra step to make
a density plot. It's a small step. You need to use the density() function to
estimate the points for the curve; however, the data can't have any missing
values. Actually 15 rows are in the 2008 data without any values.
In R these spots with missing values are labeled as NA . Luckily, it's easy to
filter those spots out.
birth2008 <- birth$X2008[!is.na(birth$X2008)]
This line of code retrieves the 2008 column from the birth data frame, and
then basically you request only the rows that don't have missing values
and store it in birth2008 . More technically speaking, is.na() checks each
item in the birth$X2008 vector and returns an equal-length vector of true
and false values known as booleans . When you pass a vector of booleans
into the index of a vector, only the items that correspond to true values
are returned. Don't worry if this confuses you. You don't have to know the
technical details to get it to work. If, however, you plan on writing your own
functions, it helps to know the language. It can also make documentation
easier to read, but you'll tend to pick it up with practice.
Now you have the clean birth rate data stored in birth2008 , so you can
pass it into the density() function to estimate a curve and store the results
in d2008 .
d2008 <- density(birth2008)
This gives the x- and y-coordinates for the curve. The cool thing about this
is that you can save the coordinates into a text file in case you want to use
a different program to make the plot. Type d2008 in the R console to see
what's stored in the variable. Here's what you get.
Call:
density.default(x = birth2008)
Data: birth2008 (219 obs.); Bandwidth 'bw' = 3.168
Search WWH ::




Custom Search