Database Reference
In-Depth Information
was loaded properly as well as to become familiar with the data. In the example,
the head() function, by default, displays the first six records of sales .
# examine the imported dataset
head(sales)
cust_id sales_total num_of_orders gender
1 100001 800.64 3 F
2 100002 217.53 3 F
3 100003 74.58 2 M
4 100004 498.60 3 M
5 100005 723.11 4 F
6 100006 69.43 2 F
The summary() function provides some descriptive statistics, such as the mean
and median, for each data column. Additionally, the minimum and maximum
values as well as the 1st and 3rd quartiles are provided. Because the gender
column contains two possible characters, an “F” (female) or “M” (male), the
summary() function provides the count of each character's occurrence.
summary(sales)
cust_id sales_total num_of_orders gender
Min. :100001 Min. : 30.02 Min. : 1.000 F:5035
1st Qu.:102501 1st Qu.: 80.29 1st Qu.: 2.000 M:4965
Median :105001 Median : 151.65 Median : 2.000
Mean :105001 Mean : 249.46 Mean : 2.428
3rd Qu.:107500 3rd Qu.: 295.50 3rd Qu.: 3.000
Max. :110000 Max. :7606.09 Max. :22.000
Plotting a dataset's contents can provide information about the relationships
between the various columns. In this example, the plot() function generates a
scatterplot of the number of orders ( sales$num_of_orders ) against the annual
sales ( sales$sales_total ). The $ is used to reference a specific column in the
dataset sales . The resulting plot is shown in Figure 3.1 .
# plot num_of_orders vs. sales
plot(sales$num_of_orders,sales$sales_total,
main="Number of Orders vs. Sales")
Search WWH ::




Custom Search