Review of Basic Data Analytic Methods Using R - Data Science and Big Data Analytics

Database Reference

In-Depth Information

3.2 Exploratory Data Analysis

So far, this chapter has addressed importing and exporting data in R, basic data

types and operations, and generating descriptive statistics. Functions such as

summary() can help analysts easily get an idea of the magnitude and range of

the data, but other aspects such as linear relationships and distributions are more

difficult to see from descriptive statistics. For example, the following code shows a

summary view of a data frame data with two columns x and y . The output shows

the range of x and y , but it's not clear what the relationship may be between these

two variables.

summary(data)

x y

Min. :-1.90483 Min. :-2.16545

1st Qu.:-0.66321 1st Qu.:-0.71451

Median : 0.09367 Median :-0.03797

Mean : 0.02522 Mean :-0.02153

3rd Qu.: 0.65414 3rd Qu.: 0.55738

Max. : 2.18471 Max. : 1.70199

A useful way to detect patterns and anomalies in the data is through the exploratory

data analysis with visualization. Visualization gives a succinct, holistic view of the

data that may be difficult to grasp from the numbers and summaries alone.

Variables x and y of the data frame data can instead be visualized in a scatterplot

( Figure 3.5 ), which easily depicts the relationship between two variables. An

important facet of the initial data exploration, visualization assesses data

cleanliness and suggests potentially important relationships in the data prior to the

model planning and building phases.

Search WWH ::

Custom Search

Home