Database Reference
In-Depth Information
def trainDTWithParams(input: RDD[LabeledPoint], maxDepth:
Int, impurity: Impurity) = {
DecisionTree.train(input, Algo.Classification, impurity,
maxDepth)
}
Now, we're ready to compute our AUC metric for different settings of tree depth. We will
simply use our original dataset in this example since we do not need the data to be stand-
ardized.
Tip
Note that decision tree models generally do not require features to be standardized or nor-
malized, nor do they require categorical features to be binary-encoded.
First, train the model using the Entropy impurity measure and varying tree depths:
val dtResultsEntropy = Seq(1, 2, 3, 4, 5, 10, 20).map {
param =>
val model = trainDTWithParams(data, param, Entropy)
val scoreAndLabels = data.map { point =>
val score = model.predict(point.features)
(if (score > 0.5) 1.0 else 0.0, point.label)
}
val metrics = new
BinaryClassificationMetrics(scoreAndLabels)
(s"$param tree depth", metrics.areaUnderROC)
}
dtResultsEntropy.foreach { case (param, auc) =>
println(f"$param, AUC = ${auc * 100}%2.2f%%") }
This should output the results shown here:
1 tree depth, AUC = 59.33%
2 tree depth, AUC = 61.68%
3 tree depth, AUC = 62.61%
4 tree depth, AUC = 63.63%
5 tree depth, AUC = 64.88%
Search WWH ::




Custom Search