Database Reference
In-Depth Information
Obviously, one could also decide to remove all of the attributes that correlate
with the sensitive ones from the dataset. Although this would resolve the discrimi-
nation problem, in this process a lot of useful information will get lost. In fact, the
occupation of a person is a very important decision variable when deciding wheth-
er to give a loan or not. The occupation attribute can hence, at the same time, re-
veal information on gender and give useful, non-discriminatory information on
loan defaulting. We provide solutions that make use of all the available informa-
tion, but in a non-discriminatory way.
14.4 Discrimination-Free Naive Bayes Classifiers
In this section, we provide three approaches for removing discrimination from a
Naive Bayes classifier.
14.4.1 Using Different Decision Thresholds
The most straightforward method for removing discrimination is to modify the de-
cision thresholds differently for the different sensitive values. For instance, we can
decide to give a high income label to females if the high income probability is
greater than 0.1, but to males if it is greater than 0.6. This instantly reduces dis-
crimination by favoring females. Note that this is a very direct form of positive
discrimination since even though the model considers some males more likely to
belong to the positive class than some females; it still predicts a negative class for
these males and a positive class for the females.
When using different decision thresholds for different sensitive values, an im-
portant question to ask is which ones to use, and why. The answer to this question
highly depends on the situation. It is well-known that using a different decision
threshold influences the number of positives, false positives, negatives, and false
negatives. Since the importance of these values differs per application, several
analysis techniques like ROC (receiver operator curve) analysis (Lachiche &
Flach, 2003) exist to aid in setting this threshold smartly. By using different deci-
sion thresholds for different sensitive attribute values, the threshold settings in ad-
dition influence the amount of positive and negative discrimination. Ideally, these
should be taken into account when performing such an analysis.
In our work, we assume that the amount of people that are assigned a positive
class should remain the same. In many applications, keeping this number close to
the number or positive labels in the data-set is highly favorable. For instance, in
the setting of banks assigning loans to individuals, the bank does not suddenly
want to assign less or more loans. In addition, as explained in Section 3, this as-
sumption makes comparing the different techniques on their discrimination score a
lot more fair. We set the decision thresholds using a simple algorithm:
1. Calculate the number of positive class labels P assigned to the data-set.
2. Learn a Naive Bayes classifier on the data-set.
3. Set the decision threshold T + and T - for the favored and discriminated sensi-
tive values to 0.5.
4. Calculate the amount of discrimination in the data-set when using T + and T - .
Search WWH ::




Custom Search