Why Unbiased Computational Processes Can Lead to Discriminative Decision Procedures - Discrimination and Privacy in the Information Society

Database Reference

In-Depth Information

they are no longer appropriate. Reasons could be, e.g., explicit discrimination,

or a change in labeling in the future. This corresponds to assumption 1 of Sec-

tion 4.2.1 being violated.

•

The sampling procedure is biased: the labels are correct and unbiased, but par-

ticular groups are under- or overrepresented in the data, leading to incorrect in-

ferences by the classifier induction. This corresponds to assumption 2 (first

principled way) of Section 4.2.1 being violated.

•

The data is incomplete; there are hidden attributes: often not all attributes that

determine the label are being monitored. Often because of reasons of privacy or

just because they are difficult to observe. In such a situation it may happen that

sensitive attributes are used as a proxy and indirectly lead to discriminatory

models. This corresponds to assumption 2 (second principled way) of Section

4.2.1 being violated.

3.3.1 Accuracy and Discrimination

Suppose that the task is to learn a classifier that divides new bank customers into

two groups: likely to repay and unlikely to repay . Based on historical data of exist-

ing customers and whether or not they repaid their loans, we learn a classifier. A

classifier is a mathematical model that allows us to extrapolate based on observa-

ble attributes such as gender, age, profession, education, income, address, and out-

standing loans to make predictions. Recall that the accuracy of a classifier learned

on such data is defined as the percentage of predictions of the classifier that are

correct. To assess this key performance measure before actually deploying the

model in practice, usually some labeled data (i.e., instances of which we already

know the outcome) is used, that has been put aside for this purpose and not been

used during the learning process.

Our analysis is based upon the following two assumptions about classification

process.

Assumption 1: the classifier learning process is only aimed at obtaining an accu-

racy as high as possible. No other objective is strived for during the data mining

phase.

Assumption 2: A classifier discriminates with respect to a sensitive attribute, e.g.

gender, if for two persons which only differ by their gender (and maybe some cha-

racteristics irrelevant for the classification problem at hand) that classifier predicts

different labels.

Note that the two persons in assumption 2 only need to agree on relevant characte-

ristics. Otherwise one could easily circumvent the definition by claiming that a

person was not discriminated based on gender, but instead because she was wear-

ing a skirt. Although people “wearing a skirt” do not constitute a protected-by-law

subpopulation, using such an attribute would be unacceptable given its high corre-

lation with gender and that characteristics such as “wearing a skirt” are considered

to be irrelevant for credit scoring. Often, however, it is far less obvious to separate

relevant and irrelevant attributes. For instance, in a mortgage application an ad-

dress may at the same time be important to assess the intrinsic value of a property,

Discrimination and Privacy in the Information Society

Search WWH ::

Custom Search

Home