Getting Your Data into Shape - R Graphics

Graphics Reference

In-Depth Information

It's possible to do more than take the mean. You may, for example, want to compute the standard

deviation and count of each group. To get the standard deviation, use the sd() function, and to

get a count, use the length() function:

ddply(cabbages, c( "Cult" , "Date" ), summarise,

Weight = mean(HeadWt),

sd = sd(HeadWt),

n = length(HeadWt))

Cult Date Weight sd n

c39 d16

3.18 0.9566144 10

c39 d20

2.80 0.2788867 10

c39 d21

2.74 0.9834181 10

c52 d16

2.26 0.4452215 10

c52 d20

3.11 0.7908505 10

c52 d21

1.47 0.2110819 10

Other useful functions for generating summary statistics include min() , max() , and median() .

Dealing with NAs

One potential pitfall is that NA is in the data will lead to NA is in the output. Let's see what happens

if we sprinkle a few NA s into HeadWt :

c1 <- cabbages

# Make a copy

c1$HeadWt[c( 1 , 20 , 45 )] <- NNA

# Set some values to NA

ddply(c1, c( "Cult" , "Date" ), summarise,

Weight = mean(HeadWt),

sd = sd(HeadWt),

n = length(HeadWt))

Cult Date Weight sd n

c39 d16

NA

NA 10

c39 d20

NA

NA 10

c39 d21

2.74 0.9834181 10

c52 d16

2.26 0.4452215 10

c52 d20

NA

NA 10

c52 d21

1.47 0.2110819 10

There are two problems here. The first problem is that mean() and sd() simply return NA if any

of the input values are NA . Fortunately, these functions have an option to deal with this very is-

sue: setting na.rm=TRUE will tell them to ignore the NA s.

The second problem is that length() counts NA is just like any other value, but since these values

represent missing data, they should be excluded from the count. The length() function doesn't

R Graphics

Search WWH ::

Custom Search

Home