Review of Basic Data Analytic Methods Using R - Data Science and Big Data Analytics

Database Reference

In-Depth Information

#import a CSV file of the total annual sales for each

customer

sales <- read.csv("c:/data/yearly_sales.csv")

is.data.frame(sales)

# returns TRUE

As seen earlier, the variables stored in the data frame can be easily accessed using

the $ notation. The following R code illustrates that in this example, each variable

is a vector with the exception of gender , which was, by a read.csv() default,

imported as a factor . Discussed in detail later in this section, a factor denotes a

categorical variable, typically with a few finite levels such as “F” and “M” in the case

of gender .

length(sales$num_of_orders) # returns 10000 (number of

customers)

is.vector(sales$cust_id) # returns TRUE

is.vector(sales$sales_total) # returns TRUE

is.vector(sales$num_of_orders) # returns TRUE

is.vector(sales$gender)

# returns FALSE

is.factor(sales$gender)

# returns TRUE

Because of their flexibility to handle many data types, data frames are the preferred

input format for many of the modeling functions available in R. The following

use of the str() function provides the structure of the sales data frame. This

function identifies the integer and numeric (double) data types, the factor variables

and levels, as well as the first few values for each variable.

str(sales) # display structure of the data frame object

'data.frame': 10000 obs. of 4 variables:

$ cust_id : int 100001 100002 100003 100004 100005 100006

…

$ sales_total : num 800.6 217.5 74.6 498.6 723.1 …

$ num_of_orders: int 3 3 2 3 4 2 2 2 2 2 …

$ gender : Factor w/ 2 levels "F","M": 1 1 2 2 1 1 2 2 1

2 …

In the simplest sense, data frames are lists of variables of the same length. A subset

of the data frame can be retrieved through subsetting operators . R's subsetting

operators are powerful in that they allow one to express complex operations in a

succinct fashion and easily retrieve a subset of the dataset.

Search WWH ::

Custom Search

Home