Machine Learning with MLlib - Learning Spark

Database Reference

In-Depth Information

Linear regression

Linear regression is one of the most common methods for regression, predicting the

output variable as a linear combination of the features. MLlib also supports L 1 and L 2

regularized regression, commonly known as Lasso and ridge regression .

The linear regression algorithms are available through the mllib.regression.Line

arRegressionWithSGD , LassoWithSGD , and RidgeRegressionWithSGD classes. These

follow a common naming pattern throughout MLlib, where problems involving mul‐

tiple algorithms have a “With” part in the class name to specify the algorithm used.

Here, SGD is Stochastic Gradient Descent.

These classes all have several parameters to tune the algorithm:

numIterations

Number of iterations to run (default: 100 ).

stepSize

Step size for gradient descent (default: 1.0 ).

intercept

Whether to add an intercept or bias feature to the data—that is, another feature

whose value is always 1 (default: false ).

regParam

Regularization parameter for Lasso and ridge (default: 1.0 ).

The way to call the algorithms differs slightly by language. In Java and Scala, you cre‐

ate a LinearRegressionWithSGD object, call setter methods on it to set the parame‐

ters, and then call run() to train a model. In Python, you instead use the class

method LinearRegressionWithSGD.train() , to which you pass key/value parame‐

ters. In both cases, you pass in an RDD of LabeledPoint s, as shown in Examples

11-10 through 11-12 .

Example 11-10. Linear regression in Python

from pyspark.mllib.regression import LabeledPoint

from pyspark.mllib.regression import LinearRegressionWithSGD

points = # (create RDD of LabeledPoint)

model = LinearRegressionWithSGD . train ( points , iterations = 200 , intercept = True )

print "weights: %s , intercept: %s " % ( model . weights , model . intercept )

Example 11-11. Linear regression in Scala

import org.apache.spark.mllib.regression.LabeledPoint

import org.apache.spark.mllib.regression.LinearRegressionWithSGD

Search WWH ::

Custom Search

Home