Database Reference
In-Depth Information
not remove the age-discrimination, as many other attributes such as, own house ,
indicating if the applicant is a home-owner, turn out to be good predictors for age .A
parallel can be drawn with the practice of redlining : denying inhabitants of particular
racially determined areas from services such as loans. It describes the now abolished
practice of marking a red line on a map to delineate the area where banks would not
invest; later the term was used for indirect discrimination against a particular group
of people (usually by race or sex) no matter the geography 1 .
12.2.2
Measuring Discrimination
There are many different ways in which discrimination could be quantified, and
each of them has its own advantages and disadvantages. Here, in this chapter, and
in our earlier works (Calders et al., 2009; Kamiran & Calders, 2010; Kamiran et
al., 2010b; Kamiran & Calders, 2009a,b; Kamiran et al., 2010a; Calders & Verwer,
2010), we define the level of discrimination in a dataset as the difference between
the probability that someone from the favored group gets a positive class and the
probability that someone from the deprived community gets a positive class. For
alternative measures of discrimination, see Chapters 5 and 6 of this topic.
For the running example of Table 12.1, the discrimination with respect to the de-
prived community Sex = female is 4/5 - 2/5 = 40%. Formally, for a sensitive attribute
S , deprived community (sensitive attribute value) f , favored community m , the dis-
crimination in D with respect to the group S
=
f , denoted disc S = f (
D
)
,isdefined
as:
= |{
X
D
|
X
(
S
)=
m
,
X
(
Class
)=+ }|
disc S = f (
D
)
:
|{
X
D
|
X
(
S
)=
m
}|
|{
X
D
|
X
(
S
)=
f
,
X
(
Class
)=+ }|
.
|{
X
D
|
X
(
S
)=
f
}|
When measuring the discrimination of a classifier, we want to assess how the classi-
fier will act on new, previously unseen examples. We assume a setting in which one
example comes at a time, and the classifier needs to assign a label to them immedi-
ately. In order to assess the level of discrimination of the classifier when it would be
applied to unseen examples, we use a test-set; that is, following standard machine
learning practice, before learning a classifier, we split the dataset in two parts; one
for learning the classifier, and one for measuring its quality. The examples of the
test-set (with their labels removed) are passed one by one to the classifier and its de-
cisions are recorded. After that, the discrimination of the classifier can be assessed
as follows. The discrimination of the classifier C with respect to the group S
=
f on
a test dataset D test , denoted disc S = f (
C
,
D test )
,isdefinedas:
1
Source: http://en.wikipedia.org/wiki/redlining , November 17th, 2011.
Search WWH ::




Custom Search