Building a Classification Model with Spark - Machine Learning with Spark

Database Reference

In-Depth Information

}

val metrics = new

BinaryClassificationMetrics(scoreAndLabels)

(model.getClass.getSimpleName, metrics.areaUnderPR,

metrics.areaUnderROC)

}

val allMetrics = metrics ++ nbMetrics ++ dtMetrics

allMetrics.foreach{ case (m, pr, roc) =>

println(f"$m, Area under PR: ${pr * 100.0}%2.4f%%, Area

under ROC: ${roc * 100.0}%2.4f%%")

}

Your output will look similar to the one here:

LogisticRegressionModel, Area under PR: 75.6759%, Area

under ROC: 50.1418%

SVMModel, Area under PR: 75.6759%, Area under ROC: 50.1418%

NaiveBayesModel, Area under PR: 68.0851%, Area under ROC:

58.3559%

DecisionTreeModel, Area under PR: 74.3081%, Area under ROC:

64.8837%

We can see that all models achieve broadly similar results for the average precision met-

ric.

Logistic regression and SVM achieve results of around 0.5 for AUC. This indicates that

they do no better than random chance! Our naïve Bayes and decision tree models fare a

little better, achieving an AUC of 0.58 and 0.65, respectively. Still, this is not a very good

result in terms of binary classification performance.

Note

While we don't cover multiclass classification here, MLlib provides a similar evaluation

class called MulticlassMetrics , which provides averaged versions of many com-

mon metrics.

Search WWH ::

Custom Search

Home