Graphics Reference
In-Depth Information
Summarizing Data by Groups
Problem
You want to summarize your data, based on one or more grouping variables.
Solution
Use ddply() from the plyr package with the summarise() function, and specify the operations
to do:
library(MASS) # For the data set
library(plyr)
ddply(cabbages, c( "Cult" , "Date" ), summarise, Weight = mean(HeadWt),
VitC = mean(VitC))
Cult Date Weight VitC
c39 d16
3.18 50.3
c39 d20
2.80 49.4
c39 d21
2.74 54.8
c52 d16
2.26 62.5
c52 d20
3.11 58.9
c52 d21
1.47 71.8
Discussion
Let's take a closer look at the cabbages data set. It has two factors that can be used as grouping
variables: Cult , which has levels c39 and c52 , and Date , which has levels d16 , d20 , and d21 . It
also has two numeric variables, HeadWt and VitC :
cabbages
Cult Date HeadWt VitC
c39 d16
2.5
51
c39 d16
2.2
55
...
c52 d21
1.5
66
c52 d21
1.6
72
Finding the overall mean of HeadWt is simple. We could just use the mean() function on that
column, but for reasons that will soon become clear, we'll use the summarise() function in-
stead:
Search WWH ::




Custom Search