Database Reference
In-Depth Information
analysis of past decision records. However, they reveals to be inadequate to cope
with the problem of searching for niches of discriminatory decisions hidden in a
large dataset of decisions.
Discrimination discovery from data consists in the actual discovery of discrim-
inatory situations and practices hidden in a large amount of historical decision
records. The aim is to extract contexts of possible discrimination supported by
legally-grounded measures of the degree of discrimination suffered by protected-
by-law groups in such contexts. Reasoning on the extracted contexts can support all
the actors in an argument about possible discriminatory behaviors. The DSS owner
can use them both to prevent incurring in future discriminatory decisions, and as a
means to argument against allegations of discriminatory behavior. A complainant in
a case can use them to find specific situations in which there is a prima facie evi-
dence of discrimination against groups she belongs to. Control authorities can base
the fight against discrimination on a formalized process of intelligent data analysis.
However, discrimination discovery from data may reveal itself an extremely diffi-
cult task. The reason is twofold. First, personal data in decision records are typically
highly dimensional: as a consequence, a huge number of possible contexts may, or
may not, be the theater for discrimination. To see this point, consider the case of
gender discrimination in credit approval: although an analyst may observe that no
discrimination occurs in general, it may turn out that foreign worker women obtain
loans to buy a new car only rarely. Many small or large niches may exist, that con-
ceal discrimination, and therefore all possible specific situations should be consid-
ered as candidates, consisting of all possible combinations of variables and variable
values: personal data, demographics, social, economic and cultural indicators, etc.
The anti-discrimination analyst is thus faced with a combinatorial explosion of pos-
sibilities, which make her work hard: albeit the task of checking some known sus-
picious situations can be conducted using available statistical methods and known
stigmatized groups, the task of discovering niches of discrimination in the data is
unsupported. The second source of complexity is indirect discrimination (see e.g.,
Tobler, 2008), namely apparently neutral practices that take into account personal
attributes correlated with indicators of race, gender, and other protected grounds and
that result in discriminatory effects on such protected groups. Even when the race
of a credit applicant is not directly recorded in the data, racial discrimination may
occur, e.g., as in the practice of redlining : people living in a certain neighborhood
are frequently denied credit; while not explicitly mentioning race, this fact can be
an indicator of discrimination, if from demographic data we can learn that most of
people living in that neighborhood belong to the same ethnic minority. Once again,
the anti-discrimination analyst is faced with a large space of possibly discrimina-
tory situations: how can she highlight all interesting discriminatory situations that
emerge from the data, both directly and in combination with further background
knowledge in her possession (e.g., census data)?
We present a classification rule mining approach for the discrimination discovery
problem, based on the following ideas. Decision policies are induced from past de-
cision records as classification rules of the form: PREMISES
DECISION ,where
Search WWH ::




Custom Search