Why Unbiased Computational Processes Can Lead to Discriminative Decision Procedures - Discrimination and Privacy in the Information Society

Database Reference

In-Depth Information

The first possible solution is to remove the sensitive attribute from the training

data. For example, if gender is the sensitive attribute in university admission deci-

sions, one would first think of excluding the gender information from the training

data. Unfortunately, as we saw in the previous section (Table 1), this solution does

not help if some other attributes are correlated with the sensitive attribute.

Consider an extreme example on a fictitious lending decisions dataset in

Table 2. If we remove the column “Ethnicity” and learn a model over the remain-

ing dataset, the model may learn that if the postal code starts with 12 then the de-

cision should be positive, otherwise the decision should be negative. We see that,

for instance, customers #4 and #5 have identical characteristics except the ethnici-

ty, and they will be offered different decisions. Such a situation is generally consi-

dered to be discriminatory.

The next step would be to remove the correlated attributes as well. This seems

straightforward in our example dataset; however, it is problematic if the attribute

to be removed also carries some objective information about the label. Suppose a

postal code is related to ethnicity, but also carries information about real estate

prices in the neighborhood. A bank would like to use the information about the

neighborhood, but not information about the ethnicity in deciding for a loan. If the

ethnicity is removed from the data, a computational model still can predict the

ethnicity (internally) indirectly, based on the postal code. If we remove the postal

code, we also remove the objective information about real estate prices that would

be useful for decision making. Therefore, more advanced discrimination handling

techniques are required.

Building Separate Models for the Sensitive Groups

The next solution that comes to mind is to train separate models for individual

sensitive groups, for example, one for males, and one for females. It may seem

that each model is objective, since individual models do not include gender infor-

mation. Unfortunately, this does not solve the problem either if the historical deci-

sions are discriminatory.

Table 3 Example (fictitious) dataset on university admissions

Applicant no.

Gender

Test score

Level

Acceptance

#1

Male

82

A

+

#2

Female

85

A

+

#3

Male

75

B

+

#4

Female

75

B

-

#5

Male

65

A

-

#6

Female

62

A

-

#7

Male

91

B

+

#8

Female

81

B

+

Consider a simplified example of a university admission case in Table 3. If we

build a model for females using only data from females, the model will learn that

every female that scores at least 80 in the test, should be accepted. Similarly, a

Discrimination and Privacy in the Information Society

Search WWH ::

Custom Search

Home