Database Reference
In-Depth Information
# to simplify the function calls, assign
x <- sales$sales_total
y <- sales$num_of_orders
cor(x,y)
# returns 0.7508015 (correlation)
cov(x,y)
# returns 345.2111 (covariance)
IQR(x)
# returns 215.21 (interquartile range)
mean(x)
# returns 249.4557 (mean)
median(x)
# returns 151.65 (median)
range(x)
# returns 30.02 7606.09 (min max)
sd(x)
# returns 319.0508 (std. dev.)
var(x)
# returns 101793.4 (variance)
The IQR() function provides the difference between the third and the first
quartiles. The other functions are fairly self-explanatory by their names. The
reader is encouraged to review the available help files for acceptable inputs and
possible options.
The function apply() is useful when the same function is to be applied to several
variables in a data frame. For example, the following R code calculates the standard
deviation for the first three variables in sales . In the code, setting MARGIN=2
specifies that the sd() function is applied over the columns. Other functions, such
as lapply() and sapply() , apply a function to a list or vector. Readers can refer
to the R help files to learn how to use these functions.
apply(sales[,c(1:3)], MARGIN=2, FUN=sd)
cust_id sales_total num_of_orders
2886.895680 319.050782 1.441119
Additional descriptive statistics can be applied with user-defined functions. The
following R code defines a function, my_range() , to compute the difference
between the maximum and minimum values returned by the range() function.
In general, user-defined functions are useful for any task or operation that needs
to be frequently repeated. More information on user-defined functions is available
by entering help("function") in the console.
# build a function to provide the difference between
# the maximum and the minimum values
my_range <- function(v) {range(v)[2] - range(v)[1]}
Search WWH ::




Custom Search