Building a Regression Model with Spark - Machine Learning with Spark

Database Reference

In-Depth Information

Training and using regression models

Training for regression models using decision trees and linear models follows the same

procedure as for classification models. We simply pass the training data contained in a

[LabeledPoint] RDD to the relevant train method. Note that in Scala, if we wanted

to customize the various model parameters (such as regularization and step size for the

SGD optimizer), we are required to instantiate a new model instance and use the optim-

izer field to access these available parameter setters.

In Python, we are provided with a convenience method that gives us access to all the avail-

able model arguments, so we only have to use this one entry point for training. We can see

the details of these convenience functions by importing the relevant modules and then call-

ing the help function on the train methods:

from pyspark.mllib.regression import LinearRegressionWithSGD

from pyspark.mllib.tree import DecisionTree

help(LinearRegressionWithSGD.train)

Doing this for the linear model outputs the following documentation:

Linear regression help documentation

Search WWH ::

Custom Search

Home