Graphics Reference
In-Depth Information
c52 d21
1.5
66
c52 d21
1.6
72
Suppose we want to find, for each case, the deviation of HeadWt from the overall mean. All we
have to do is take the overall mean and subtract it from the observed value for each case:
transform(cabbages, DevWt = HeadWt - mean(HeadWt))
Cult Date HeadWt VitC DevWt
c39 d16
2.5
51 -0.093333333
c39 d16
2.2
55 -0.393333333
...
c52 d21
1.5
66 -1.093333333
c52 d21
1.6
72 -0.993333333
You'll often want to do separate operations like this for each group, where the groups are spe-
cified by one or more grouping variables. Suppose, for example, we want to normalize the data
within each group by finding the deviation of each case from the mean withinthegroup, where
the groups are specified by Cult . In these cases, we can use ddply() from the plyr package with
the transform() function:
library(plyr)
cb <- ddply(cabbages, "Cult" , transform, DevWt = HeadWt - mean(HeadWt))
cb
Cult Date HeadWt VitC DevWt
c39 d16
2.5
51 -0.40666667
c39 d16
2.2
55 -0.70666667
...
c52 d21
1.5
66 -0.78000000
c52 d21
1.6
72 -0.68000000
First it splits cabbages into separate data frames based on the value of Cult . There are two
levels of Cult , c39 and c52 , so there are two data frames. It then applies the transform()
function, with the remaining arguments, to each data frame.
Notice that the call to ddply() has all the same parts as the previous call to transform() . The
only differences are that the parts are slightly rearranged and it adds the splitting variable, in this
case, Cult .
The before and after results are shown in Figure 15-2 :
# The data before normalizing
ggplot(cb, aes(x = Cult, y = HeadWt)) + geom_boxplot()
Search WWH ::




Custom Search