Database Reference
In-Depth Information
Chapter 6. Building a Regression Model
with Spark
In this chapter, we will build on what we covered in Chapter 5 , Building a Classification
Model with Spark . While classification models deal with outcomes that represent discrete
classes, regression models are concerned with target variables that can take any real value.
The underlying principle is very similar—we wish to find a model that maps input features
to predicted target variables. Like classification, regression is also a form of supervised
learning.
Regression models can be used to predict just about any variable of interest. A few ex-
amples include the following:
• Predicting stock returns and other economic variables
• Predicting loss amounts for loan defaults (this can be combined with a classifica-
tion model that predicts the probability of default, while the regression model pre-
dicts the amount in the case of a default)
• Recommendations (the Alternating Least Squares factorization model from
Chapter 4 , Building a Recommendation Engine with Spark , uses linear regression
in each iteration)
• Predicting customer lifetime value ( CLTV ) in a retail, mobile, or other business,
based on user behavior and spending patterns
In the following sections, we will:
• Introduce the various types of regression models available in MLlib
• Explore feature extraction and target variable transformation for regression models
• Train a number of regression models using MLlib
• See how to make predictions using the trained models
• Investigate the impact on performance of various parameter settings for regression
using cross-validation
Search WWH ::




Custom Search