Database Reference
In-Depth Information
each rule comes with a confidence measure, stating the probability of the decision
given the premises of the rule; for instance, the rule RACE = BLACK , CITY =NYC
CLASS = BAD with confidence 0.75 states that black people from NYC are assigned
bad credit with a 75% probability.
Three kinds of facts (items) are used in decision rules: (potentially) discrimina-
tory items, such as RACE = BLACK , (potentially) non-discriminatory items, such as
CITY =NYC, and decision items, such as CLASS = BAD . The potentially discrimina-
tory items are specified by a reference legal framework, to denote some designated
groups of people protected by the anti-discrimination laws. The non-discriminatory
items define the context where a discriminatory decision may take place - here, the
set of applicants from the city of NYC.
Given an historical dataset of decision records, the decision rules hidden in the
dataset can be found using association rule mining , which allows to extract all the
classification rules of the desired form that, in the source dataset, are supported
by a specified minimum number of decisions. Continuing the example, the rule
RACE = BLACK , CITY =NYC
CLASS = BAD is automatically found by associa-
tion rule mining, if the number of black people in NYC receiving the bad credit is
above a minimum threshold value. Such a threshold, known as the minimum sup-
port, is meaningful from a legal viewpoint, since it accounts for a minimum number
of possibly discriminated persons.
In which circumstances does an extracted rule reveal a (possibly unintentional)
discriminatory decision strategy? The idea here is to weight the discrimination of a
rule by the gain of confidence due to the presence of the potentially discriminatory
items in the premise of the rule. In the example, we compare the 0.75 confidence
of the rule RACE = BLACK , CITY =NYC
CLASS = BAD with the confidence of the
rule obtained removing the first item, i.e., CITY =NYC
CLASS = BAD . If, e.g., the
confidence of the latter rule is 0.25, then we conclude that black people in NYC have
a probability of being assigned bad credit which is 3 times larger than that of the
general population of NYC. In this case, a measure called elift is used to quantify
discrimination risk, which is defined as the ratio of the confidence of the two rules
above (with and without the discriminatory item). Whether the rule in the example
is to be considered discriminatory or not can now be assessed by thresholding the
elift measure - possibly according to a value specified in the reference legislation,
that limits the acceptable disproportion of treatment. While we use elift to illustrate
examples throughout the chapter, it is worth noting that several other measures of
discrimination (see Section 5.2.2) have been considered in the legal and economic
literature, none of which is superior to the others. Actually, our approach is para-
metric in the definition of a reference measure.
By considering all classification rules with a value of the elift higher than the
threshold, we can find all the contexts where a discriminatory decision has been
taken: in the example, by enumerating all rules of the form RACE = BLACK , B
CLASS = BAD an anti-discrimination analyst discovers all situations B where black
people suffered a discriminatory credit decision, whatever the complexity of the
context B and in compliance with the reference legal framework.
Search WWH ::




Custom Search