Introducing Positive Discrimination in Predictive Models - Discrimination and Privacy in the Information Society

Database Reference

In-Depth Information

Obviously, one could also decide to remove all of the attributes that correlate

with the sensitive ones from the dataset. Although this would resolve the discrimi-

nation problem, in this process a lot of useful information will get lost. In fact, the

occupation of a person is a very important decision variable when deciding wheth-

er to give a loan or not. The occupation attribute can hence, at the same time, re-

veal information on gender and give useful, non-discriminatory information on

loan defaulting. We provide solutions that make use of all the available informa-

tion, but in a non-discriminatory way.

14.4 Discrimination-Free Naive Bayes Classifiers

In this section, we provide three approaches for removing discrimination from a

Naive Bayes classifier.

14.4.1 Using Different Decision Thresholds

The most straightforward method for removing discrimination is to modify the de-

cision thresholds differently for the different sensitive values. For instance, we can

decide to give a high income label to females if the high income probability is

greater than 0.1, but to males if it is greater than 0.6. This instantly reduces dis-

crimination by favoring females. Note that this is a very direct form of positive

discrimination since even though the model considers some males more likely to

belong to the positive class than some females; it still predicts a negative class for

these males and a positive class for the females.

When using different decision thresholds for different sensitive values, an im-

portant question to ask is which ones to use, and why. The answer to this question

highly depends on the situation. It is well-known that using a different decision

threshold influences the number of positives, false positives, negatives, and false

negatives. Since the importance of these values differs per application, several

analysis techniques like ROC (receiver operator curve) analysis (Lachiche &

Flach, 2003) exist to aid in setting this threshold smartly. By using different deci-

sion thresholds for different sensitive attribute values, the threshold settings in ad-

dition influence the amount of positive and negative discrimination. Ideally, these

should be taken into account when performing such an analysis.

In our work, we assume that the amount of people that are assigned a positive

class should remain the same. In many applications, keeping this number close to

the number or positive labels in the data-set is highly favorable. For instance, in

the setting of banks assigning loans to individuals, the bank does not suddenly

want to assign less or more loans. In addition, as explained in Section 3, this as-

sumption makes comparing the different techniques on their discrimination score a

lot more fair. We set the decision thresholds using a simple algorithm:

1. Calculate the number of positive class labels P assigned to the data-set.

2. Learn a Naive Bayes classifier on the data-set.

3. Set the decision threshold T + and T - for the favored and discriminated sensi-

tive values to 0.5.

4. Calculate the amount of discrimination in the data-set when using T + and T - .

Search WWH ::

Custom Search

Home