Database Reference
In-Depth Information
val points : RDD [ LabeledPoint ] = // ...
val lr = new LinearRegressionWithSGD (). setNumIterations ( 200 ). setIntercept ( true )
val model = lr . run ( points )
println ( "weights: %s, intercept: %s" . format ( model . weights , model . intercept ))
Example 11-12. Linear regression in Java
import org.apache.spark.mllib.regression.LabeledPoint ;
import org.apache.spark.mllib.regression.LinearRegressionWithSGD ;
import org.apache.spark.mllib.regression.LinearRegressionModel ;
JavaRDD < LabeledPoint > points = // ...
LinearRegressionWithSGD lr =
new LinearRegressionWithSGD (). setNumIterations ( 200 ). setIntercept ( true );
LinearRegressionModel model = lr . run ( points . rdd ());
System . out . printf ( "weights: %s, intercept: %s\n" ,
model . weights (), model . intercept ());
Note that in Java, we need to convert our JavaRDD to the Scala RDD class by call‐
ing .rdd() on it. This is a common pattern throughout MLlib because the MLlib
methods are designed to be callable from both Java and Scala.
Once trained, the LinearRegressionModel returned in all languages includes a pre
dict() function that can be used to predict a value on a single vector. The RidgeRe
gressionWithSGD and LassoWithSGD classes behave similarly and return a similar
model class. Indeed, this pattern of an algorithm with parameters adjusted through
setters, which returns a Model object with a predict() method, is common in all of
MLlib.
Logistic regression
Logistic regression is a binary classification method that identifies a linear separating
plane between positive and negative examples. In MLlib, it takes LabeledPoint s with
label 0 or 1 and returns a LogisticRegressionModel that can predict new points.
The logistic regression algorithm has a very similar API to linear regression, covered
in the previous section. One difference is that there are two algorithms available for
solving it: SGD and LBFGS. 4 LBFGS is generally the best choice, but is not available
in some earlier versions of MLlib (before Spark 1.2). These algorithms are available in
the mllib.classification.LogisticRegressionWithLBFGS and WithSGD classes,
which have interfaces similar to LinearRegressionWithSGD . They take all the same
parameters as linear regression (see the previous section ).
4 LBFGS is an approximation to Newton's method that converges in fewer iterations than stochastic gradient
descent. It is described at http://en.wikipedia.org/wiki/Limited-memory_BFGS .
 
Search WWH ::




Custom Search