Building a Classification Model with Spark - Machine Learning with Spark

Database Reference

In-Depth Information

.setRegParam(regParam)

.setMiniBatchFraction(miniBatchFraction)

LogisticGradient sets up the logistic loss function that defines our logistic regres-

sion model.

Tip

While a detailed treatment of optimization techniques is beyond the scope of this topic,

MLlib provides two optimizers for linear models: SGD and L-BFGS. L-BFGS is often

more accurate and has fewer parameters to tune.

SGD is the default, while L-BGFS can currently only be used directly for logistic regres-

sion via LogisticRegressionWithLBFGS . Try it out yourself and compare the res-

ults to those found with SGD.

See http://spark.apache.org/docs/latest/mllib-optimization.html for further details.

To investigate the impact of the remaining parameter settings, we will create a helper

function that will train a logistic regression model, given a set of parameter inputs. First,

we will import the required classes:

import org.apache.spark.rdd.RDD

import org.apache.spark.mllib.optimization.Updater

import org.apache.spark.mllib.optimization.SimpleUpdater

import org.apache.spark.mllib.optimization.L1Updater

import org.apache.spark.mllib.optimization.SquaredL2Updater

import

org.apache.spark.mllib.classification.ClassificationModel

Next, we will define our helper function to train a mode given a set of inputs:

def trainWithParams(input: RDD[LabeledPoint], regParam:

Double, numIterations: Int, updater: Updater, stepSize:

Double) = {

val lr = new LogisticRegressionWithSGD

lr.optimizer.setNumIterations(numIterations).

setUpdater(updater).setRegParam(regParam).setStepSize(stepSize)

lr.run(input)

}

Search WWH ::

Custom Search

Home