Database Reference
In-Depth Information
So far, we have assumed that discriminatory items are recorded in the source
data. This is not always the case, e.g., race may be not available or even collectable.
What if the discriminatory variables are not directly available? In this case, in-
direct discrimination may occur. Consider the rule ZIP =10451, CITY =NYC
CLASS = BAD , with confidence 0.95, stating that the residents of a given neighbor-
hood of NYC are assigned bad credit with a 95% chance. Apparently, this rule does
not unveil any discriminatory practice. However, assume that the following other
rule can be coded from available information, such as census data: ZIP =10451,
CITY =NYC
RACE = BLACK , with confidence 0.80, stating that 80% of resi-
dents of that particular neighborhood of NYC are black. Then it is possible to
prove a theoretical lower bound of 0.94 for the confidence of the combined rule
ZIP =10451, CITY =NYC, RACE = BLACK
CLASS = BAD , stating that 94% of
black people in that neighborhood are assigned bad credit, around 3.7 times the
general population of NYC. This reasoning shows that the original rule unveils a
case of redlining.
Different measures of the discrimination power of the mined decision rules can
be defined, according to the provision of different anti-discrimination regulations:
e.g., the EU Directives (European Union Legislation, 2011) state that discrimination
on a given attribute occurs when “a higher proportion of people without the attribute
comply or are able to comply” (which we will code as the risk ratio measure), while
the US Equal Pay Act (U.S. Federal Legislation, 2011) states that: “a selection rate
for any race, sex, or ethnic group which is less than four-fifths of the rate for the
group with the highest rate will generally be regarded as evidence of adverse impact”
(which we will code as the selection ratio measure).
Our discrimination discovery approach opens a promising avenue for research,
based on an apparently paradoxical idea: data mining, which is typically used to
create potentially discriminatory profiles and classifications, can also be used the
other way round, as a powerful aid to the anti-discrimination analyst, capable of
automatically discovering the patterns of discrimination that emerge from the avail-
able data with the strongest prima facie evidence. The preliminary experiments on
a dataset of credit decisions operated by a German bank show that this method is
able to pinpoint evidence of discrimination: the cited highly discriminatory rule that
“foreign worker women are assigned bad credit among those who intend to buy a
new car” is actually discovered from such a database.
The rest of the chapter is organized as follows. Section 5.2 introduces the tech-
nicalities of classification rules and measures of discrimination defined over them.
Using those tools, we show how the anti-discrimination analyst can go through the
analysis of direct discrimination (Section 5.3), indirect discrimination (Section 5.4),
respondent argumentation (Section 5.5), and affirmative actions (Section 5.6). Some
details on the analytical tool DCUBE, which supports the discrimination discovery
process, are provided in Section 5.7. Finally, we summarize the approach and dis-
cuss some challenging lines for future research.
Search WWH ::




Custom Search