Database Reference
In-Depth Information
was loaded properly as well as to become familiar with the data. In the example,
the
head()
function, by default, displays the first six records of
sales
.
# examine the imported dataset
head(sales)
cust_id sales_total num_of_orders gender
1 100001 800.64 3 F
2 100002 217.53 3 F
3 100003 74.58 2 M
4 100004 498.60 3 M
5 100005 723.11 4 F
6 100006 69.43 2 F
The
summary()
function provides some descriptive statistics, such as the mean
and median, for each data column. Additionally, the minimum and maximum
values as well as the 1st and 3rd quartiles are provided. Because the
gender
column contains two possible characters, an “F” (female) or “M” (male), the
summary()
function provides the count of each character's occurrence.
summary(sales)
cust_id sales_total num_of_orders gender
Min. :100001 Min. : 30.02 Min. : 1.000 F:5035
1st Qu.:102501 1st Qu.: 80.29 1st Qu.: 2.000 M:4965
Median :105001 Median : 151.65 Median : 2.000
Mean :105001 Mean : 249.46 Mean : 2.428
3rd Qu.:107500 3rd Qu.: 295.50 3rd Qu.: 3.000
Max. :110000 Max. :7606.09 Max. :22.000
Plotting a dataset's contents can provide information about the relationships
between the various columns. In this example, the
plot()
function generates a
scatterplot of the number of orders (
sales$num_of_orders
) against the annual
sales (
sales$sales_total
). The
$
is used to reference a specific column in the
# plot num_of_orders vs. sales
plot(sales$num_of_orders,sales$sales_total,
main="Number of Orders vs. Sales")