Graphics Reference
In-Depth Information
Summarizing Data by Groups
Problem
You want to summarize your data, based on one or more grouping variables.
Solution
Use
ddply()
from the plyr package with the
summarise()
function, and specify the operations
to do:
library(MASS)
# For the data set
library(plyr)
ddply(cabbages, c(
"Cult"
,
"Date"
), summarise, Weight
=
mean(HeadWt),
VitC
=
mean(VitC))
Cult Date Weight VitC
c39 d16
3.18 50.3
c39 d20
2.80 49.4
c39 d21
2.74 54.8
c52 d16
2.26 62.5
c52 d20
3.11 58.9
c52 d21
1.47 71.8
Discussion
Let's take a closer look at the
cabbages
data set. It has two factors that can be used as grouping
variables:
Cult
, which has levels
c39
and
c52
, and
Date
, which has levels
d16
,
d20
, and
d21
. It
also has two numeric variables,
HeadWt
and
VitC
:
cabbages
Cult Date HeadWt VitC
c39 d16
2.5
51
c39 d16
2.2
55
...
c52 d21
1.5
66
c52 d21
1.6
72
Finding the overall mean of
HeadWt
is simple. We could just use the
mean()
function on that
column, but for reasons that will soon become clear, we'll use the
summarise()
function in-
stead: