Why Unbiased Computational Processes Can Lead to Discriminative Decision Procedures - Discrimination and Privacy in the Information Society

Database Reference

In-Depth Information

data mining techniques to model the risk category of customers based on their age,

profession, type of car, and history of accidents. This model can then be used to

advise the agent on pricing when a new client applies for car insurance.

In this chapter we will assume that a data table is given for learning a model,

for example, data about past clients of an insurance company and their claims.

Every rows of the table represent an individual case, called an instance . In the in-

surance company example, every row could correspond to one historical client.

The instances are described by their characteristics, called attributes or variables .

The attributes of a client could for example be his or her gender, age, years of

driving experience, a type of car, a type of insurance policy. For every client the

exact same set of attributes is specified. Usually there is also one special target

attribute , called the class attribute that the company is interested to predict. For

the insurance example, this could, e.g., be whether or not the client has a high ac-

cident risk. The value of this attribute can be determined by the insurance claims

of the client. Clients with a lot of accidents will be in the high risk category, the

others in the low risk category. When a new client arrives, the company wants to

predict his or her risk as accurately as possible, based upon the values of the other

attributes. This process is called classification . For classification we need model

the dependency of the class attribute on the other attributes. For that purpose many

classification algorithms have been developed in machine learning, data mining

and pattern recognition fields, e.g. a decision tree, a support vector machine, logis-

tic regression. For a given classification task a model that relates the value of the

class attribute to the other attributes needs to be learned on the training data ; i.e.,

instances of which the class attribute is known. A learned model for a given task

could be for example a set of rules such as:

IF Gender=male and car type=sport THEN risk=high.

Once a model is learned, it can be deployed for classifying new instances of which

the class attribute is unknown. The process of learning a classifier from training

data is often referred to as Classifier induction . For a more detailed overview of

classifiers and how they can be derived from historical data, see Chapter 2.

In this chapter we will show that data mining and classifier induction can lead

to similar problems as for human decision makers, including basing their decisions

upon discriminatory generalizations. This can be particularly harmful since data

mining methods are often seen as solidly based upon statistics and hence purely

rational and without prejudice. Discrimination is the prejudiced treatment of an

individual based on their membership in a certain group or category. In most Eu-

ropean and Northern-American countries, it is forbidden by law to discriminate

against certain protected-by-law groups (See Chapter 4 of this topic for an over-

view). Although we do not explicitly refer to the anti-discrimination legislation of

a particular country, most of our examples will directly relate to EU directives and

legislation. The European Union has one of the strongest anti-discrimination legis-

lations (See, e.g., Directive 2000/43/EC, Directive 2000/78/EC/ Directive

2002/73/EC, Article 21 of the Charter of Fundamental Rights and Protocol

12/Article 14 of the European Convention on Human Rights), describing discrim-

ination on the basis of race, ethnicity, religion, nationality, gender, sexuality,

Discrimination and Privacy in the Information Society

Search WWH ::

Custom Search

Home