Graphics Reference
In-Depth Information
low age lwt race smoke ptl ht ui ftv bwt
0 19 182
2
0
0 0 1
0 2523
0 33 155
3
0
0 0 0
3 2551
0 20 105
1
1
0 0 0
1 2557
...
We looked at the relationship between
smoke
(smoking) and
bwt
(birth weight in grams). The
value of
smoke
is either 0 or 1, but since it's stored as a numeric vector,
ggplot()
doesn't know
that it should be treated as a categorical variable. To make it so
ggplot()
knows to treat
smoke
as categorical, we can either convert that column of the data frame to a factor, or tell
ggplot()
to
treat it as a factor by using
factor(smoke)
inside of the
aes()
statement. For these examples,
we converted it to a factor in the data.
Another method for visualizing the distributions is to use facets, as shown in
Figure 6-12
. We
can align the facets vertically or horizontally. Here we'll align them vertically so that it's easy to
compare the two distributions:
Figure 6-12. Left: density curves with facets; right: with different facet labels
ggplot(birthwt1, aes(x
=
bwt))
+
geom_density()
+
facet_grid(smoke ~
.
)
One problem with the faceted graph is that the facet labels are just 0 and 1, and there's no label
indicating that those values are for
smoke
. To change the labels, we need to change the names
of the factor levels. First we'll take a look at the factor levels, then we'll assign new factor level
names, in the same order: