Database Reference
In-Depth Information
3.1 Introduction to R
R is a programming language and software framework for statistical analysis and
graphics. Available for use under the GNU General Public License [1], R software
and installation instructions can be obtained via the Comprehensive R Archive and
Network [2]. This section provides an overview of the basic functionality of R. In
later chapters, this foundation in R is utilized to demonstrate many of the presented
analytical techniques.
Before delving into specific operations and functions of R later in this chapter, it
is important to understand the flow of a basic R script to address an analytical
problem. The following R code illustrates a typical analytical situation in which a
dataset is imported, the contents of the dataset are examined, and some modeling
building tasks are executed. Although the reader may not yet be familiar with the
R syntax, the code can be followed by reading the embedded comments, denoted
by # . In the following scenario, the annual sales in U.S. dollars for 10,000 retail
customers have been provided in the form of a comma-separated-value (CSV) file.
The read.csv() function is used to import the CSV file. This dataset is stored to
the R variable sales using the assignment operator <- .
# import a CSV file of the total annual sales for each
customer
sales <- read.csv("c:/data/yearly_sales.csv")
# examine the imported dataset
head(sales)
summary(sales)
# plot num_of_orders vs. sales
plot(sales$num_of_orders,sales$sales_total,
main="Number of Orders vs. Sales")
# perform a statistical analysis (fit a linear regression
model)
results <- lm(sales$sales_total ˜ sales$num_of_orders)
summary(results)
# perform some diagnostics on the fitted model
# plot histogram of the residuals
hist(results$residuals, breaks = 800)
In this example, the data file is imported using the read.csv() function. Once the
file has been imported, it is useful to examine the contents to ensure that the data
Search WWH ::




Custom Search