Building a Regression Model with Spark - Machine Learning with Spark

Database Reference

In-Depth Information

Chapter 6. Building a Regression Model

with Spark

In this chapter, we will build on what we covered in Chapter 5 , Building a Classification

Model with Spark . While classification models deal with outcomes that represent discrete

classes, regression models are concerned with target variables that can take any real value.

The underlying principle is very similar—we wish to find a model that maps input features

to predicted target variables. Like classification, regression is also a form of supervised

learning.

Regression models can be used to predict just about any variable of interest. A few ex-

amples include the following:

• Predicting stock returns and other economic variables

• Predicting loss amounts for loan defaults (this can be combined with a classifica-

tion model that predicts the probability of default, while the regression model pre-

dicts the amount in the case of a default)

• Recommendations (the Alternating Least Squares factorization model from

Chapter 4 , Building a Recommendation Engine with Spark , uses linear regression

in each iteration)

• Predicting customer lifetime value ( CLTV ) in a retail, mobile, or other business,

based on user behavior and spending patterns

In the following sections, we will:

• Introduce the various types of regression models available in MLlib

• Explore feature extraction and target variable transformation for regression models

• Train a number of regression models using MLlib

• See how to make predictions using the trained models

• Investigate the impact on performance of various parameter settings for regression

using cross-validation

Search WWH ::

Custom Search

Home