Database Reference
In-Depth Information
CHAPTER 9
Modeling Data
In this chapter, we'll perform the fourth step of the OSEMN model (and the last step
to require a computer): modeling data. Generally speaking, to model data is to create
an abstract or higher-level description of your data. Just like with creating visualiza‐
tions, it's like taking a step back from the individual data points.
Visualizations, on the one hand, are characterized by shapes, positions, and colors
such that we can interpret them by looking at them. Models, on the other hand, are
internally characterized by a bunch of numbers, which means that computers can use
them, for example, to make predictions about new data points. (We can still visualize
models so that we can try to understand them and see how they are performing.)
In this chapter, we'll consider four common types of algorithms to model data:
• Dimensionality reduction
• Clustering
• Regression
• Classification
These four types of algorithms come from the field of machine learning. As such,
we're going to change our vocabulary a bit. Let's assume that we have a CSV file, also
known as a data set . Each row, except for the header, is considered to be a data point .
For simplicity we assume that each column that contains numerical values is an input
feature . If a data point also contains a nonnumerical field, such as the species column
in the Iris data set, then that is known as the data point's label .
The first two types of algorithms (dimensionality reduction and clustering) are most
often unsupervised, which means that they create a model based on the features of
the data set only. The last two types of algorithms (regression and classification) are
 
Search WWH ::




Custom Search