Database Reference
In-Depth Information
each rule comes with a confidence measure, stating the probability of the decision
given the premises of the rule; for instance, the rule
RACE
=
BLACK
,
CITY
=NYC
→
CLASS
=
BAD
with confidence 0.75 states that black people from NYC are assigned
bad credit with a 75% probability.
Three kinds of facts (items) are used in decision rules: (potentially) discrimina-
tory items, such as
RACE
=
BLACK
, (potentially) non-discriminatory items, such as
CITY
=NYC, and decision items, such as
CLASS
=
BAD
. The potentially discrimina-
tory items are specified by a reference legal framework, to denote some designated
groups of people protected by the anti-discrimination laws. The non-discriminatory
items define the context where a discriminatory decision may take place - here, the
set of applicants from the city of NYC.
Given an historical dataset of decision records, the decision rules hidden in the
dataset can be found using
association rule mining
, which allows to extract all the
classification rules of the desired form that, in the source dataset, are supported
by a specified minimum number of decisions. Continuing the example, the rule
RACE
=
BLACK
,
CITY
=NYC
CLASS
=
BAD
is automatically found by associa-
tion rule mining, if the number of black people in NYC receiving the bad credit is
above a minimum threshold value. Such a threshold, known as the minimum sup-
port, is meaningful from a legal viewpoint, since it accounts for a minimum number
of possibly discriminated persons.
In which circumstances does an extracted rule reveal a (possibly unintentional)
discriminatory decision strategy? The idea here is to weight the discrimination of a
rule by the gain of confidence due to the presence of the potentially discriminatory
items in the premise of the rule. In the example, we compare the 0.75 confidence
of the rule
RACE
=
BLACK
,
CITY
=NYC
→
CLASS
=
BAD
with the confidence of the
rule obtained removing the first item, i.e.,
CITY
=NYC
→
CLASS
=
BAD
. If, e.g., the
confidence of the latter rule is 0.25, then we conclude that black people in NYC have
a probability of being assigned bad credit which is 3 times larger than that of the
general population of NYC. In this case, a measure called
elift
is used to quantify
discrimination risk, which is defined as the ratio of the confidence of the two rules
above (with and without the discriminatory item). Whether the rule in the example
is to be considered discriminatory or not can now be assessed by thresholding the
elift
measure - possibly according to a value specified in the reference legislation,
that limits the acceptable disproportion of treatment. While we use
elift
to illustrate
examples throughout the chapter, it is worth noting that several other measures of
discrimination (see Section 5.2.2) have been considered in the legal and economic
literature, none of which is superior to the others. Actually, our approach is para-
metric in the definition of a reference measure.
By considering all classification rules with a value of the
elift
higher than the
threshold, we can find all the contexts where a discriminatory decision has been
taken: in the example, by enumerating
all rules
of the form
RACE
=
BLACK
,
B
→
→
CLASS
=
BAD
an anti-discrimination analyst discovers all situations
B
where black
people suffered a discriminatory credit decision, whatever the complexity of the
context
B
and in compliance with the reference legal framework.
Search WWH ::
Custom Search