Graphics Reference
In-Depth Information
c52 d21
1.5
66
c52 d21
1.6
72
Suppose we want to find, for each case, the deviation of
HeadWt
from the overall mean. All we
have to do is take the overall mean and subtract it from the observed value for each case:
transform(cabbages, DevWt
=
HeadWt
-
mean(HeadWt))
Cult Date HeadWt VitC DevWt
c39 d16
2.5
51 -0.093333333
c39 d16
2.2
55 -0.393333333
...
c52 d21
1.5
66 -1.093333333
c52 d21
1.6
72 -0.993333333
You'll often want to do separate operations like this for each group, where the groups are spe-
cified by one or more grouping variables. Suppose, for example, we want to normalize the data
within each group by finding the deviation of each case from the mean withinthegroup, where
the groups are specified by
Cult
. In these cases, we can use
ddply()
from the plyr package with
the
transform()
function:
library(plyr)
cb
<-
ddply(cabbages,
"Cult"
, transform, DevWt
=
HeadWt
-
mean(HeadWt))
cb
Cult Date HeadWt VitC DevWt
c39 d16
2.5
51 -0.40666667
c39 d16
2.2
55 -0.70666667
...
c52 d21
1.5
66 -0.78000000
c52 d21
1.6
72 -0.68000000
First it splits
cabbages
into separate data frames based on the value of
Cult
. There are two
levels of
Cult
,
c39
and
c52
, so there are two data frames. It then applies the
transform()
function, with the remaining arguments, to each data frame.
Notice that the call to
ddply()
has all the same parts as the previous call to
transform()
. The
only differences are that the parts are slightly rearranged and it adds the splitting variable, in this
case,
Cult
.
The before and after results are shown in
Figure 15-2
:
# The data before normalizing
ggplot(cb, aes(x
=
Cult, y
=
HeadWt))
+
geom_boxplot()