Database Reference
In-Depth Information
.setRegParam(regParam)
.setMiniBatchFraction(miniBatchFraction)
LogisticGradient sets up the logistic loss function that defines our logistic regres-
sion model.
Tip
While a detailed treatment of optimization techniques is beyond the scope of this topic,
MLlib provides two optimizers for linear models: SGD and L-BFGS. L-BFGS is often
more accurate and has fewer parameters to tune.
SGD is the default, while L-BGFS can currently only be used directly for logistic regres-
sion via LogisticRegressionWithLBFGS . Try it out yourself and compare the res-
ults to those found with SGD.
See http://spark.apache.org/docs/latest/mllib-optimization.html for further details.
To investigate the impact of the remaining parameter settings, we will create a helper
function that will train a logistic regression model, given a set of parameter inputs. First,
we will import the required classes:
import org.apache.spark.rdd.RDD
import org.apache.spark.mllib.optimization.Updater
import org.apache.spark.mllib.optimization.SimpleUpdater
import org.apache.spark.mllib.optimization.L1Updater
import org.apache.spark.mllib.optimization.SquaredL2Updater
import
org.apache.spark.mllib.classification.ClassificationModel
Next, we will define our helper function to train a mode given a set of inputs:
def trainWithParams(input: RDD[LabeledPoint], regParam:
Double, numIterations: Int, updater: Updater, stepSize:
Double) = {
val lr = new LogisticRegressionWithSGD
lr.optimizer.setNumIterations(numIterations).
setUpdater(updater).setRegParam(regParam).setStepSize(stepSize)
lr.run(input)
}
Search WWH ::




Custom Search