Information Technology Reference
In-Depth Information
•
mean(x)
•
median(x)
•
sd(x)
•
var(x)
•
cor(x, y)
•
cov(x, y)
Discussion
When I first opened the documentation for R, I began searching for material called
something like “Procedures for Calculating Standard Deviation.” I figured that such an
important topic would likely require a whole chapter.
It's not that complicated.
Standard deviation and other basic statistics are calculated by simple functions. Ordi-
narily, the function argument is a vector of numbers, and the function returns the
calculated statistic:
>
x <- c(0,1,1,2,3,5,8,13,21,34)
>
mean(x)
[1] 8.8
>
median(x)
[1] 4
>
sd(x)
[1] 11.03328
>
var(x)
[1] 121.7333
The
sd
function calculates the sample standard deviation, and
var
calculates the sample
variance.
The
cor
and
cov
functions can calculate the correlation and covariance, respectively,
between two vectors:
>
x <- c(0,1,1,2,3,5,8,13,21,34)
>
y <- log(x+1)
>
cor(x,y)
[1] 0.9068053
>
cov(x,y)
[1] 11.49988
All these functions are picky about values that are not available (NA). Even one NA
value in the vector argument causes any of these functions to return NA, or even halt
altogether with a cryptic error:
>
x <- c(0,1,1,2,3,NA)
>
mean(x)
[1] NA
>
sd(x)
[1] NA