Database Reference
In-Depth Information
This changes however when we use the predictions of the Naive Bayes classifi-
er to determine whether someone has a high or a low income:
Table 14.4 The gender-predicted income contingency table for the test-set, assigned by a
Naive Bayes classifier
Low income High income
Female 5094 327
Male 8731 2129
The amount of discrimination in these predictions is (2129 / (8731+2129)) -
(327 / (5094+327)) = 0.20 - 0.06 = 0.14. Thus, surprisingly, the total amount of
discrimination has become less. However, notice also that the total positive class
probability has dropped from 0.24 to 0.15; I.e., less people get assigned the class
label “High income”. This drop artificially lowers the discrimination score. We cor-
rect for this drop by lowering the decision threshold of the Naive Bayes classifier
until the positive class probability reaches 0.24. This results in the following table:
Table 14.5 The gender-predicted income contingency table for the test-set, corrected to
maintain positive class (high income) probability
Low income High income
Female 4958 463
Male 7416 3444
The positive class probability for females is 0.09, while the positive class prob-
ability for males is 0.32, resulting in a total discrimination of 0.32 - 0.09 = 0.23.
This is a lot worse than the amount of discrimination in the actual labels of the
test-set. One may wonder why this is such a big problem, since the data already
told us that females are less likely to have high incomes. Suppose that such a dis-
criminating classifier is used in a decision support system for deciding whether to
give a loan to a new applicant. Let us take a look at a part of the decisions made
by such a system:
Table 14.6 The corrected gender-predicted income contingency for high income test cases
Low income
High income
Female
319
271
Male
1051
2205
This table shows the labels assigned by the classifier to people in the test-set
that actually have a high income. The ones that get assigned a low income in the
table are the false negatives. In the banking example, these are the ones that are
falsely denied a loan by the classifier. These false negatives are very important
for a decision support system because denying a loan to someone that should
 
Search WWH ::




Custom Search