Database Reference
In-Depth Information
Table 12.2 Sample relation for the income class example with positive class probability
Sex Ethnicity
Highest Degree Job Type
Class Prob
m
native
university
board
+
.99
m
native
high school
board
+
.90
m
native
university
education
+
.92
m
non-native university
healthcare +
.76
m
non-native
none
healthcare
-
.44
f
non-native
high school
board
-
.09
f
native
university
education
-
.66
f
native
none
healthcare
+
.66
f
non-native
high school
education
-
.02
f
native
university
board
+
.92
this probability exceeds 0.5, the object is assigned to the positive class. The objects
close to the decision boundary are those with a probability close to 0.5. We select
these objects first to relabel.
Example 1. Consider again the dataset D given in Table 12.1. We want to learn
a classifier to predict the class of objects for which the predictions are non-
discriminatory towards Sex
=
f . In this example we rank the objects by their positive
class probability given by a Naive Bayes classification model. In Table 12.2 the pos-
itive class probabilities as given by this ranker are added to the table for reference
(calculated using the “NBS” classifier of Weka (Hall et al., 2009)).
In the second step, we arrange the data separately for female applicants with
class
in ascending order
with respect to their positive class probability. Relabeling one promotion candidate
and one demotion candidate makes the data discrimination-free. Hence, we relabel
the top promotion candidate; that is, the highest scoring female with a negative
class label, and the top demotion candidate; that is, the lowest scoring male with
a positive class label (the bold examples in Table 12.2). After the labels for these
instances are changed, the discrimination decreases from 40% to 0% . The resulting
dataset is used as a training set for classifier induction.
in descending order and for male applicants with class
+
12.3.1.2
Reweighing and Resampling
The massaging approach is rather intrusive as it changes the class labels of the ob-
jects. Our second approach does not have this disadvantage. Instead of relabeling
the objects, different weights are attached to them. For example, the deprived com-
munity objects with X
(
Class
)=+
get higher weights than the deprived community
objects with X
(
Class
)=
and the favored community objects with X
(
Class
)=+
(
)=
get lower weights than the favored community objects with X
. We refer
to this method as massaging . Again we assume that we want to reduce the discrimi-
nation to 0 while maintaining the overall positive class probability. We now discuss
the idea behind the weight calculation.
Class
Search WWH ::




Custom Search