Database Reference
In-Depth Information
Chapter 7. Building a Clustering Model
with Spark
In the last few chapters, we covered supervised learning methods, where the training data is
labeled with the true outcome that we would like to predict (for example, a rating for re-
commendations and class assignment for classification or real target variable in the case of
regression).
Next, we will consider the case when we do not have labeled data available. This is called
unsupervised learning, as the model is not supervised with the true target label. The unsu-
pervised case is very common in practice, since obtaining labeled training data can be very
difficult or expensive in many real-world scenarios (for example, having humans label
training data with class labels for classification). However, we would still like to learn
some underlying structure in the data and use these to make predictions.
This is where unsupervised learning approaches can be useful. Unsupervised learning mod-
els are also often combined with supervised models, for example, applying unsupervised
techniques to create new input features for supervised models.
Clustering models are, in many ways, the unsupervised equivalent of classification models.
With classification, we tried to learn a model that would predict which class a given train-
ing example belonged to. The model was essentially a mapping from a set of features to the
class.
In clustering, we would like to segment the data such that each training example is assigned
to a segment called a cluster . The clusters act much like classes, except that the true class
assignments are unknown.
Clustering models have many use cases that are the same as classification; these include the
following:
• Segmenting users or customers into different groups based on behavior character-
istics and metadata
• Grouping content on a website or products in a retail business
• Finding clusters of similar genes
• Segmenting communities in ecology
• Creating image segments for use in image analysis applications such as object de-
tection
Search WWH ::




Custom Search