Database Reference
In-Depth Information
Accuracy and prediction error
The prediction error for binary classification is possibly the simplest measure available. It
is the number of training examples that are misclassified, divided by the total number of
examples. Similarly, accuracy is the number of correctly classified examples divided by the
total examples.
We can calculate the accuracy of our models in our training data by making predictions on
each input feature and comparing them to the true label. We will sum up the number of cor-
rectly classified instances and divide this by the total number of data points to get the aver-
age classification accuracy:
val lrTotalCorrect = data.map { point =>
if (lrModel.predict(point.features) == point.label) 1 else
0
}.sum
val lrAccuracy = lrTotalCorrect / data.count
The output is as follows:
lrAccuracy: Double = 0.5146720757268425
This gives us 51.5 percent accuracy, which doesn't look particularly impressive! Our model
got only half of the training examples correct, which seems to be about as good as a ran-
dom chance.
Note
Note that the predictions made by the model are not naturally exactly 1 or 0. The output is
usually a real number that must be turned into a class prediction. This is done through use
of a threshold in the classifier's decision or scoring function.
For example, binary logistic regression is a probabilistic model that returns the estimated
probability of class 1 in its scoring function. Thus, a decision threshold of 0.5 is typical.
That is, if the estimated probability of being in class 1 is higher than 50 percent, the model
decides to classify the point as class 1; otherwise, it will be classified as class 0.
Note that the threshold itself is effectively a model parameter that can be tuned in some
models. It also plays a role in evaluation measures, as we will see now.
Search WWH ::




Custom Search