Discrimination Data Analysis: A Multi-disciplinary Bibliography - Discrimination and Privacy in the Information Society

Database Reference

In-Depth Information

Discrimination discovery from data consists in the actual discovery of discrim-

inatory situations and practices hidden in a large amount of historical decision

records. The aim is to unveil contexts of possible discrimination on the basis of

legally-grounded measures of the degree of discrimination suffered by protected-

by-law groups in such contexts. The legal principle of under-representation has in-

spired existing approaches for discrimination discovery based on pattern mining.

Starting from a dataset of historical decision records, (Pedreschi et al., 2008; Rug-

gieri et al., 2010a) propose to extract classification rules such as RACE = BLACK ,

PURPOSE = NEW CAR

CREDIT = NO , called potentially discriminatory (PD) rules,

to unveil contexts (here, people asking for a loan to buy a new car) where the pro-

tected group (here, black people) suffered from under-representation with respect

to the decision (here, credit denial). The approach has been implemented on top

of an Oracle database by relying on tools for frequent itemset mining (Ruggieri et

al., 2010b), and extended in (Pedreschi et al., 2009; Ruggieri et al., 2010c; Luong,

2011). The main limitation of the approach is that there is no control of the char-

acteristics (e.g., capacity to repay the loan) of the protected group, versus, or as

opposed to others in this context.

This results in an overly large number of PD rules that need to be further

screened. (Luong et al., 2011) exploit the idea of situation testing. For each member

of the protected group with a negative decision outcome, testers with similar char-

acteristics are searched for in a dataset of historical decision records. If one can ob-

serve significantly different decision outcomes between the testers of the protected

group and the testers of the unprotected group, one can ascribe the negative decision

to a bias against the protected group, thus labeling the individual as discriminated.

The approaches so far described assume that the dataset under analysis contains

items to denote protected groups. This may be not the case when such items are not

available, or not even collectable at micro-data level, e.g., as in the case of the loan

applicant's race. (Ruggieri et al., 2010a, 2010c) adopt a form of rule inference to

cope with the indirect discovery of (either direct or indirect) discrimination.

Discrimination prevention in data mining and machine learning consists of ex-

tracting models (typically, classifiers) that trade off accuracy for non-discrimination.

In fact, mining from historical data may mean to discover traditional prejudices that

are endemic in reality (i.e., taste-based discrimination), or to discover patterns of

lower performances, skills or capacities of protected-by-law groups (i.e., statistical

discrimination). Mining algorithms may then assign to such discriminatory prac-

tices the status of general rules, which are subsequently used for automatic decision

making in socially sensitive tasks (see e.g., (N. Cheng et al., 2011; Chien & Chen,

2008; Yap et al., 2011)).

Discrimination prevention has been recognized as an issue in the tutorial (Clifton,

2003, Slide 19), where the danger of building classifiers capable of redlining dis-

crimination in home loans has been put forward. In predictive statistics, the same

issue has been raised by (Pope & Sydnor, 2007). The naıve approach of deleting

attributes that denote protected groups from the original dataset does not prevent a

classifier to indirectly learn discriminatory decisions, since other attributes strongly

correlated with them could be used as a proxy by the model extraction algorithm.

→

Discrimination and Privacy in the Information Society

Search WWH ::

Custom Search

Home