Graphics Reference
In-Depth Information
It's possible to do more than take the mean. You may, for example, want to compute the standard
deviation and count of each group. To get the standard deviation, use the sd() function, and to
get a count, use the length() function:
ddply(cabbages, c( "Cult" , "Date" ), summarise,
Weight = mean(HeadWt),
sd = sd(HeadWt),
n = length(HeadWt))
Cult Date Weight sd n
c39 d16
3.18 0.9566144 10
c39 d20
2.80 0.2788867 10
c39 d21
2.74 0.9834181 10
c52 d16
2.26 0.4452215 10
c52 d20
3.11 0.7908505 10
c52 d21
1.47 0.2110819 10
Other useful functions for generating summary statistics include min() , max() , and median() .
Dealing with NAs
One potential pitfall is that NA is in the data will lead to NA is in the output. Let's see what happens
if we sprinkle a few NA s into HeadWt :
c1 <- cabbages
# Make a copy
c1$HeadWt[c( 1 , 20 , 45 )] <- NNA
# Set some values to NA
ddply(c1, c( "Cult" , "Date" ), summarise,
Weight = mean(HeadWt),
sd = sd(HeadWt),
n = length(HeadWt))
Cult Date Weight sd n
c39 d16
NA
NA 10
c39 d20
NA
NA 10
c39 d21
2.74 0.9834181 10
c52 d16
2.26 0.4452215 10
c52 d20
NA
NA 10
c52 d21
1.47 0.2110819 10
There are two problems here. The first problem is that mean() and sd() simply return NA if any
of the input values are NA . Fortunately, these functions have an option to deal with this very is-
sue: setting na.rm=TRUE will tell them to ignore the NA s.
The second problem is that length() counts NA is just like any other value, but since these values
represent missing data, they should be excluded from the count. The length() function doesn't
Search WWH ::




Custom Search