Building a Classification Model with Spark - Machine Learning with Spark

Database Reference

In-Depth Information

Tuning model parameters

The previous section showed the impact on model performance of feature extraction and

selection, as well as the form of input data and a model's assumptions around data distribu-

tions. So far, we have discussed model parameters only in passing, but they also play a sig-

nificant role in model performance.

MLlib's default train methods use default values for the parameters of each model. Let's

take a deeper look at them.

Linear models

Both logistic regression and SVM share the same parameters, because they use the same

underlying optimization technique of stochastic gradient descent ( SGD ). They differ only

in the loss function applied. If we take a look at the class definition for logistic regression

in MLlib, we will see the following definition:

class LogisticRegressionWithSGD private (

private var stepSize: Double,

private var numIterations: Int,

private var regParam: Double,

private var miniBatchFraction: Double)

extends

GeneralizedLinearAlgorithm[LogisticRegressionModel] ...

We can see that the arguments that can be passed to the constructor are stepSize ,

numIterations , regParam , and miniBatchFraction . Of these, all except

regParam are related to the underlying optimization technique.

The instantiation code for logistic regression initializes the Gradient , Updater , and

Optimizer and sets the relevant arguments for Optimizer ( GradientDescent in

this case):

private val gradient = new LogisticGradient()

private val updater = new SimpleUpdater()

override val optimizer = new GradientDescent(gradient,

updater)

.setStepSize(stepSize)

.setNumIterations(numIterations)

Search WWH ::

Custom Search

Home