Database Reference
In-Depth Information
of discrimination. While a -protection is defined with reference to elift , its definition
clearly applies to any measure from Figure 5.1. An extension of a -protection to ac-
count for its statistical significance is proposed in (Pedreschi et al., 2009; Ruggieri
et al., 2010c). Also, we refer the reader to (Ruggieri et al. 2010a,2010c) for the pre-
sentation and experimentation of data mining algorithms able to efficiently extract
a -protective classification rules from a large dataset of historical decision records.
Finally, (Pedreschi et al., 2012) show that the choice of a reference measure from
Figure 5.1 has a critical impact on the ranking imposed over the set of PD classi-
fication rules. In other words, selecting a specific discrimination measure is not a
neutral choice, in that it implicitly implies a specific moral criterion to evaluate the
degree of discrimination in a specific context; i.e., different ways to establish how
bad is a discriminatory action. We found it interesting that our quantitative logi-
cal framework for discriminatory rules can help understanding the consequences of
such choices in law and jurisprudence.
5.3
Direct Discrimination Discovery
From this section on, we formalize various legal concepts in discrimination anal-
ysis and discovery as reasonings over the set of extracted classification rules. We
start by considering direct discrimination, which, accordingly to (Ellis, 2005), oc-
curs “where one person is treated less favorably than another”. For the purposes of
making a prima facie evidence in a case before the court, it is enough to show that
only one individual has been treated unfairly in comparison to another. However,
this may be difficult to prove. The complainant may then use aggregate analysis to
establish a regular pattern of unfavorable treatment of the disadvantaged group she
belongs to. This is also the approach that control authorities and internal auditing
may undertake in analysing historical decisions in search of contexts of discrimina-
tion against protected-by-law groups. In direct discrimination, we assume that the
input dataset contains attributes to denote potentially discriminated groups. This is a
reasonable assumption for attributes such as sex and age, or for attributes that can be
explicitly added by control authorities, such as pregnancy status. The next section
will consider the case of attributes not available at all or not even collectable. Under
our assumption, regular patterns of discrimination can then be identified by looking
at PD classification rules of the form:
A
,
B
BENEFIT = DENIED
i.e., where the consequent consists of denying a benefit (a loan, school admission, a
job, etc.). Rules of the form above are then screened by selecting/ranking those with
a minimum value of a reference discrimination measure. In terms of Def. 4, we are
then looking for “ a -discrimination of PD classification rules denying benefit”.
As an example, consider our running example dataset and fix the PD items as in
Table 5.1. By ranking classification rules of the form A
CLASS = BAD accord-
ingly to their extended lift measure, we found near the top positions the following:
,
B
PERSONAL STATUS = FEMALE , FOREIGN WORKER = YES ,
Search WWH ::




Custom Search