Database Reference
In-Depth Information
The first possible solution is to remove the sensitive attribute from the training
data. For example, if gender is the sensitive attribute in university admission deci-
sions, one would first think of excluding the gender information from the training
data. Unfortunately, as we saw in the previous section (Table 1), this solution does
not help if some other attributes are correlated with the sensitive attribute.
Consider an extreme example on a fictitious lending decisions dataset in
Table 2. If we remove the column “Ethnicity” and learn a model over the remain-
ing dataset, the model may learn that if the postal code starts with 12 then the de-
cision should be positive, otherwise the decision should be negative. We see that,
for instance, customers #4 and #5 have identical characteristics except the ethnici-
ty, and they will be offered different decisions. Such a situation is generally consi-
dered to be discriminatory.
The next step would be to remove the correlated attributes as well. This seems
straightforward in our example dataset; however, it is problematic if the attribute
to be removed also carries some objective information about the label. Suppose a
postal code is related to ethnicity, but also carries information about real estate
prices in the neighborhood. A bank would like to use the information about the
neighborhood, but not information about the ethnicity in deciding for a loan. If the
ethnicity is removed from the data, a computational model still can predict the
ethnicity (internally) indirectly, based on the postal code. If we remove the postal
code, we also remove the objective information about real estate prices that would
be useful for decision making. Therefore, more advanced discrimination handling
techniques are required.
Building Separate Models for the Sensitive Groups
The next solution that comes to mind is to train separate models for individual
sensitive groups, for example, one for males, and one for females. It may seem
that each model is objective, since individual models do not include gender infor-
mation. Unfortunately, this does not solve the problem either if the historical deci-
sions are discriminatory.
Table 3 Example (fictitious) dataset on university admissions
Applicant no.
Gender
Test score
Level
Acceptance
#1
Male
82
A
+
#2
Female
85
A
+
#3
Male
75
B
+
#4
Female
75
B
-
#5
Male
65
A
-
#6
Female
62
A
-
#7
Male
91
B
+
#8
Female
81
B
+
Consider a simplified example of a university admission case in Table 3. If we
build a model for females using only data from females, the model will learn that
every female that scores at least 80 in the test, should be accepted. Similarly, a
Search WWH ::




Custom Search