Database Reference
In-Depth Information
Key Concepts
Basic features of R
Data exploration and analysis with R
Statistical methods for evaluation
The previous chapter presented the six phases of the Data Analytics Lifecycle.
• Phase 1: Discovery
• Phase 2: Data Preparation
• Phase 3: Model Planning
• Phase 4: Model Building
• Phase 5: Communicate Results
• Phase 6: Operationalize
The first three phases involve various aspects of data exploration. In general, the
success of a data analysis project requires a deep understanding of the data. It also
requires a toolbox for mining and presenting the data. These activities include the
study of the data in terms of basic statistical measures and creation of graphs and
plots to visualize and identify relationships and patterns. Several free or commercial
tools are available for exploring, conditioning, modeling, and presenting data.
Because of its popularity and versatility, the open-source programming language R
is used to illustrate many of the presented analytical tasks and models in this topic.
This chapter introduces the basic functionality of the R programming language and
environment. The first section gives an overview of how to use R to acquire, parse,
and filter the data as well as how to obtain some basic descriptive statistics on a
dataset. The second section examines using R to perform exploratory data analysis
tasks using visualization. The final section focuses on statistical inference, such as
hypothesis testing and analysis of variance in R.
Search WWH ::




Custom Search