Building a Classification Model with Spark - Machine Learning with Spark

Database Reference

In-Depth Information

• Classifying images, video, or sounds (most often multiclass, with potentially very

many different classes)

• Assigning categories or tags to news articles, web pages, or other content (multi-

class)

• Discovering e-mail and web spam, network intrusions, and other malicious beha-

vior (binary or multiclass)

• Detecting failure situations, for example in computer systems or networks

• Ranking customers or users in order of probability that they might purchase a

product or use a service (this can be framed as classification by predicting prob-

abilities and then ranking in the descending order)

• Predicting customers or users who might stop using a product, service, or pro-

vider (called churn)

These are just a few possible use cases. In fact, it is probably safe to say that classification

is one of the most widely used machine learning and statistical techniques in modern busi-

nesses and especially online businesses.

In this chapter, we will:

• Discuss the types of classification models available in MLlib

• Use Spark to extract the appropriate features from raw input data

• Train a number of classification models using MLlib

• Make predictions with our classification models

• Apply a number of standard evaluation techniques to assess the predictive per-

formance of our models

• Illustrate how to improve model performance using some of the feature-extraction

approaches from Chapter 3 , Obtaining, Processing, and Preparing Data with

Spark

• Explore the impact of parameter tuning on model performance and learn how to

use cross-validation to select the most optimal model parameters

Search WWH ::

Custom Search

Home