Database Reference
In-Depth Information
Discrimination discovery
from data consists in the actual discovery of discrim-
inatory situations and practices hidden in a large amount of historical decision
records. The aim is to unveil contexts of possible discrimination on the basis of
legally-grounded
measures of the degree of discrimination suffered by protected-
by-law groups in such contexts. The legal principle of under-representation has in-
spired existing approaches for discrimination discovery based on pattern mining.
Starting from a dataset of historical decision records, (Pedreschi et al., 2008; Rug-
gieri et al., 2010a) propose to extract classification rules such as
RACE
=
BLACK
,
PURPOSE
=
NEW CAR
CREDIT
=
NO
, called
potentially discriminatory
(PD) rules,
to unveil contexts (here, people asking for a loan to buy a new car) where the pro-
tected group (here, black people) suffered from under-representation with respect
to the decision (here, credit denial). The approach has been implemented on top
of an Oracle database by relying on tools for frequent itemset mining (Ruggieri et
al., 2010b), and extended in (Pedreschi et al., 2009; Ruggieri et al., 2010c; Luong,
2011). The main limitation of the approach is that there is no control of the char-
acteristics (e.g., capacity to repay the loan) of the protected group, versus, or as
opposed to others in this context.
This results in an overly large number of PD rules that need to be further
screened. (Luong et al., 2011) exploit the idea of situation testing. For each member
of the protected group with a negative decision outcome, testers with similar char-
acteristics are searched for in a dataset of historical decision records. If one can ob-
serve significantly different decision outcomes between the testers of the protected
group and the testers of the unprotected group, one can ascribe the negative decision
to a bias against the protected group, thus labeling the individual as discriminated.
The approaches so far described assume that the dataset under analysis contains
items to denote protected groups. This may be not the case when such items are not
available, or not even collectable at micro-data level, e.g., as in the case of the loan
applicant's race. (Ruggieri et al., 2010a, 2010c) adopt a form of rule inference to
cope with the indirect discovery of (either direct or indirect) discrimination.
Discrimination prevention
in data mining and machine learning consists of ex-
tracting models (typically, classifiers) that trade off accuracy for non-discrimination.
In fact, mining from historical data may mean to discover traditional prejudices that
are endemic in reality (i.e., taste-based discrimination), or to discover patterns of
lower performances, skills or capacities of protected-by-law groups (i.e., statistical
discrimination). Mining algorithms may then assign to such discriminatory prac-
tices the status of general rules, which are subsequently used for automatic decision
making in socially sensitive tasks (see e.g., (N. Cheng et al., 2011; Chien & Chen,
2008; Yap et al., 2011)).
Discrimination prevention has been recognized as an issue in the tutorial (Clifton,
2003, Slide 19), where the danger of building classifiers capable of redlining dis-
crimination in home loans has been put forward. In predictive statistics, the same
issue has been raised by (Pope & Sydnor, 2007). The naıve approach of deleting
attributes that denote protected groups from the original dataset does not prevent a
classifier to indirectly learn discriminatory decisions, since other attributes strongly
correlated with them could be used as a proxy by the model extraction algorithm.
→
Search WWH ::
Custom Search