Why Unbiased Computational Processes Can Lead to Discriminative Decision Procedures - Discrimination and Privacy in the Information Society

Database Reference

In-Depth Information

Data Labeling

Third, the historical data to be used for training a model contains the true labels ,

which in certain cases may be incorrect and contain prejudices. Labels are the tar-

gets that an organization wants to predict for new incoming instances. The true la-

bels in the historical data may be objective or subjective . The labels are objective

when assigning these labels, no human interpretation was involved; the labels are

hard in the sense that there can be no disagreement about their correctness be-

tween different human observers. Examples of objective labels include the indica-

tors weather an existing bank customer repaid a credit or not, whether a suspect

was wearing a concealed weapon, or whether a driver tested positive or negative

for alcohol intoxication. Examples of subjective labels include the assessment of a

human resource manager if a job candidate is suitable for a particular job, if a

client of a bank should get a loan or not, accepting or denying a student to a uni-

versity, the decision whether or not to detain a suspect. For the subjective labels

there is a gray area in which human judgment may have influenced the labeling

resulting in a bias in the target attribute. In contrast to the objective labels, here

there may be disagreement between different observers; different people may as-

sess a job candidate or student application differently; the notion of what is the

correct label is fuzzy.

The distinction between subjective and objective labels is important in assess-

ing and preventing discrimination. Only the subjective labels can be incorrect due

to biased decision making in the historical data. For instance, if females have been

discriminated in university admission, some labels in our database saying whether

persons should be admitted will be incorrect according to the present non-

discriminatory regulations. Objective labels, on the other hand, will be correct

even if our database is collected in a biased manner. For instance, we may choose

to detain suspects selectively, but the resulting true label whether a given suspect

actually carried a gun or not will be measurable and is thus objectively correct.

The computational modeling process requires an insightful analysis of the ori-

gins and properties of training data. Due to origins of data the computational mod-

els trained on this data may be based on incorrect assumptions, and as a result, as

we will see in the next section, may lead to biased decision making.

3.3 Types of Problems

In this section we discuss three scenarios that show how the violation of the as-

sumptions sketched in the previous section may affect the validity of models

learned on data and lead to discriminatory decision procedures. In all three scena-

rios we explicitly assume that the only goal of data mining is to optimize accuracy

of predictions, i.e. there is no incentive to discriminate based on taste. Before we

go into the scenarios, we first recall the important notion of accuracy of predic-

tions and we explain how we will assess discrimination of a classifier. Then we

will deal with three scenarios illustrating the following situations:

•

Labels are incorrect: due to historical discrimination the labels are biased. Even

though the labels accurately represent decisions of the past, for the future task

Discrimination and Privacy in the Information Society

Search WWH ::

Custom Search

Home