Database Reference
In-Depth Information
• Classifying images, video, or sounds (most often multiclass, with potentially very
many different classes)
• Assigning categories or tags to news articles, web pages, or other content (multi-
class)
• Discovering e-mail and web spam, network intrusions, and other malicious beha-
vior (binary or multiclass)
• Detecting failure situations, for example in computer systems or networks
• Ranking customers or users in order of probability that they might purchase a
product or use a service (this can be framed as classification by predicting prob-
abilities and then ranking in the descending order)
• Predicting customers or users who might stop using a product, service, or pro-
vider (called churn)
These are just a few possible use cases. In fact, it is probably safe to say that classification
is one of the most widely used machine learning and statistical techniques in modern busi-
nesses and especially online businesses.
In this chapter, we will:
• Discuss the types of classification models available in MLlib
• Use Spark to extract the appropriate features from raw input data
• Train a number of classification models using MLlib
• Make predictions with our classification models
• Apply a number of standard evaluation techniques to assess the predictive per-
formance of our models
• Illustrate how to improve model performance using some of the feature-extraction
approaches from Chapter 3 , Obtaining, Processing, and Preparing Data with
Spark
• Explore the impact of parameter tuning on model performance and learn how to
use cross-validation to select the most optimal model parameters
Search WWH ::




Custom Search