Getting Your Data into Shape - R Graphics

Graphics Reference

In-Depth Information

c52 d21

1.5

66

c52 d21

1.6

72

Suppose we want to find, for each case, the deviation of HeadWt from the overall mean. All we

have to do is take the overall mean and subtract it from the observed value for each case:

transform(cabbages, DevWt = HeadWt - mean(HeadWt))

Cult Date HeadWt VitC DevWt

c39 d16

2.5

51 -0.093333333

c39 d16

2.2

55 -0.393333333

...

c52 d21

1.5

66 -1.093333333

c52 d21

1.6

72 -0.993333333

You'll often want to do separate operations like this for each group, where the groups are spe-

cified by one or more grouping variables. Suppose, for example, we want to normalize the data

within each group by finding the deviation of each case from the mean withinthegroup, where

the groups are specified by Cult . In these cases, we can use ddply() from the plyr package with

the transform() function:

library(plyr)

cb <- ddply(cabbages, "Cult" , transform, DevWt = HeadWt - mean(HeadWt))

cb

Cult Date HeadWt VitC DevWt

c39 d16

2.5

51 -0.40666667

c39 d16

2.2

55 -0.70666667

...

c52 d21

1.5

66 -0.78000000

c52 d21

1.6

72 -0.68000000

First it splits cabbages into separate data frames based on the value of Cult . There are two

levels of Cult , c39 and c52 , so there are two data frames. It then applies the transform()

function, with the remaining arguments, to each data frame.

Notice that the call to ddply() has all the same parts as the previous call to transform() . The

only differences are that the parts are slightly rearranged and it adds the splitting variable, in this

case, Cult .

The before and after results are shown in Figure 15-2 :

# The data before normalizing

ggplot(cb, aes(x = Cult, y = HeadWt)) + geom_boxplot()

R Graphics

Search WWH ::

Custom Search

Home