Database Reference
In-Depth Information
MLlib comes with a set of built-in routines to compute the area under the PR and ROC
curves for binary classification. Here, we will compute these metrics for each of our mod-
els:
import
org.apache.spark.mllib.evaluation.BinaryClassificationMetrics
val metrics = Seq(lrModel, svmModel).map { model =>
val scoreAndLabels = data.map { point =>
(model.predict(point.features), point.label)
}
val metrics = new
BinaryClassificationMetrics(scoreAndLabels)
(model.getClass.getSimpleName, metrics.areaUnderPR,
metrics.areaUnderROC)
}
As we did previously to train the naïve Bayes model and computing accuracy, we need to
use the special nbData version of the dataset that we created to compute the classifica-
tion metrics:
val nbMetrics = Seq(nbModel).map{ model =>
val scoreAndLabels = nbData.map { point =>
val score = model.predict(point.features)
(if (score > 0.5) 1.0 else 0.0, point.label)
}
val metrics = new
BinaryClassificationMetrics(scoreAndLabels)
(model.getClass.getSimpleName, metrics.areaUnderPR,
metrics.areaUnderROC)
}
Note that because the DecisionTreeModel model does not implement the Classi-
ficationModel interface that is implemented by the other three models, we need to
compute the results separately for this model in the following code:
val dtMetrics = Seq(dtModel).map{ model =>
val scoreAndLabels = data.map { point =>
val score = model.predict(point.features)
(if (score > 0.5) 1.0 else 0.0, point.label)
Search WWH ::




Custom Search