Getting Your Data into Shape - R Graphics

Graphics Reference

In-Depth Information

Discussion

Another method is to calculate the standard error in the call ddply . It's not possible to refer to

the sd and n columns inside of the ddply call, so we'll have to recalculate them to get se . This

will do the same thing as the two-step version shown previously:

ddply(cabbages, c( "Cult" , "Date" ), summarise,

Weight = mean(HeadWt, na.rm = TRUE

TRUE ),

n = sum(!is.na(HeadWt)),

se = sd / sqrtn) )

Confidence Intervals

Confidence intervals are calculated using the standard error of the mean and the degrees of free-

dom. To calculate a confidence interval, use the qt() function to get the quantile, then multiply

that by the standard error. The qt() function will give quantiles of the t-distribution when given

a probability level and degrees of freedom. For a 95% confidence interval, use a probability level

of .975 ; for the bell-shaped t-distribution, this will in essence cut of 2.5% of the area under the

curve at either end. The degrees of freedom equal the sample size minus one.

This will calculate the multiplier for each group. There are six groups and each has the same

number of observations (10), so they will all have the same multiplier:

sd = sd(HeadWt, na.rm = TRUE

ciMult <- qt( .975 , ca$n - 1 )

ciMult

# 2.262157 2.262157 2.262157 2.262157 2.262157 2.262157

Now we can multiply that vector by the standard error to get the 95% confidence interval:

ca$ci <- ca$se * ciMult

Cult Date Weight sd n se ci

c39 d16 3.18 0.9566144 10 0.30250803 0.6843207

c39 d20 2.80 0.2788867 10 0.08819171 0.1995035

c39 d21 2.74 0.9834181 10 0.31098410 0.7034949

c52 d16 2.26 0.4452215 10 0.14079141 0.3184923

c52 d20 3.11 0.7908505 10 0.25008887 0.5657403

c52 d21 1.47 0.2110819 10 0.06674995 0.1509989

We could have done this all in one line, like this:

ca$ci95 <- ca$se * qt( .975 , ca$n)

For a 99% confidence interval, use .995 .

Search WWH ::

Custom Search

Home