Information Technology Reference
In-Depth Information
Likewise, if
x
is either a data frame or a matrix,
cor(x)
returns the correlation matrix
and
cov(x)
returns the covariance matrix:
>
cor(dframe)
small medium big
small 1.00000000 -0.3711367 -0.08424345
medium -0.37113670 1.0000000 -0.11466070
big -0.08424345 -0.1146607 1.00000000
>
cov(dframe)
small medium big
small 0.34152627 -0.21516416 -0.04005275
medium -0.21516416 0.98411974 -0.09253855
big -0.04005275 -0.09253855 0.66186326
Alas, the
median
function does not understand data frames. To calculate the medians
of data frame columns, use the
lapply
function to apply the
median
function to each
column separately.
See Also
See
Recipe 1.12
for calculating the confidence interval of the mean. See
Recipe 1.15
for
testing the significance of a correlation.
1.9 Initializing a Data Frame from Column Data
Problem
Your data is organized by columns, and you want to assemble it into a data frame.
Solution
If your data is captured in several vectors and/or factors, use the
data.frame
function
to assemble them into a data frame:
>
dfrm <- data.frame(v1, v2, v3, f1, f2)
Use
as.data.frame
instead if your data is captured in a
list
that contains vectors and/
or factors:
>
dfrm <- as.data.frame(list.of.vectors)
Discussion
A data frame is a collection of columns, each of which corresponds to an observed
variable (in the statistical sense, not the programming sense). If your data is already
organized into columns, then it's easy to build a data frame.
The
data.frame
function can construct a data frame from vectors, where each vector is
one observed variable. Suppose you have two numeric predictor variables, one cate-
gorical predictor variable, and one response variable. The
data.frame
function can cre-
ate a data frame from your vectors: