Database Reference
In-Depth Information
Overview
In this chapter, you'll learn:
• A practical definition of data science
• What the command line is exactly and how you can use it
• Why the command line is a wonderful environment for doing data science
Data Science Is OSEMN
The field of data science is still in its infancy, and as such, there exist various
definitions of what it encompasses. Throughout this topic we employ a very practical
definition by Mason & Wiggins (2010). They define data science according to the fol‐
lowing five steps: (1) obtaining data, (2) scrubbing data, (3) exploring data, (4) mod‐
eling data, and (5) interpreting data. Together, these steps form the OSEMN model
(which is pronounced as awesome ). This definition serves as the backbone of this
topic because each step, (except step 5, interpreting data) has its own chapter. The fol‐
lowing five subsections explain what each step entails.
Although the five steps are discussed in a linear and incremental
fashion, in practice it is very common to move back and forth
between them or to perform multiple steps at the same time. Doing
data science is an iterative and nonlinear process. For example, once
you have modeled your data, and you look at the results, you may
decide to go back to the scrubbing step to adjust the features of the
data set.
Obtaining Data
Without any data, there is little data science you can do. So the first step is to obtain
data. Unless you are fortunate enough to already possess data, you may need to do
one or more of the following:
• Download data from another location (e.g., a web page or server)
• Query data from a database or API (e.g., MySQL or Twitter)
• Extract data from another file (e.g., an HTML file or spreadsheet)
• Generate data yourself (e.g., reading sensors or taking surveys)
In Chapter 3 , we discuss several methods for obtaining data using the command line.
The obtained data will most likely be in either plain text, CSV, JSON, or HTML/XML
format. The next step is to scrub this data.
Search WWH ::




Custom Search