Database Reference
In-Depth Information
data mining techniques to model the risk category of customers based on their age,
profession, type of car, and history of accidents. This model can then be used to
advise the agent on pricing when a new client applies for car insurance.
In this chapter we will assume that a data table is given for learning a model,
for example, data about past clients of an insurance company and their claims.
Every rows of the table represent an individual case, called an instance . In the in-
surance company example, every row could correspond to one historical client.
The instances are described by their characteristics, called attributes or variables .
The attributes of a client could for example be his or her gender, age, years of
driving experience, a type of car, a type of insurance policy. For every client the
exact same set of attributes is specified. Usually there is also one special target
attribute , called the class attribute that the company is interested to predict. For
the insurance example, this could, e.g., be whether or not the client has a high ac-
cident risk. The value of this attribute can be determined by the insurance claims
of the client. Clients with a lot of accidents will be in the high risk category, the
others in the low risk category. When a new client arrives, the company wants to
predict his or her risk as accurately as possible, based upon the values of the other
attributes. This process is called classification . For classification we need model
the dependency of the class attribute on the other attributes. For that purpose many
classification algorithms have been developed in machine learning, data mining
and pattern recognition fields, e.g. a decision tree, a support vector machine, logis-
tic regression. For a given classification task a model that relates the value of the
class attribute to the other attributes needs to be learned on the training data ; i.e.,
instances of which the class attribute is known. A learned model for a given task
could be for example a set of rules such as:
IF Gender=male and car type=sport THEN risk=high.
Once a model is learned, it can be deployed for classifying new instances of which
the class attribute is unknown. The process of learning a classifier from training
data is often referred to as Classifier induction . For a more detailed overview of
classifiers and how they can be derived from historical data, see Chapter 2.
In this chapter we will show that data mining and classifier induction can lead
to similar problems as for human decision makers, including basing their decisions
upon discriminatory generalizations. This can be particularly harmful since data
mining methods are often seen as solidly based upon statistics and hence purely
rational and without prejudice. Discrimination is the prejudiced treatment of an
individual based on their membership in a certain group or category. In most Eu-
ropean and Northern-American countries, it is forbidden by law to discriminate
against certain protected-by-law groups (See Chapter 4 of this topic for an over-
view). Although we do not explicitly refer to the anti-discrimination legislation of
a particular country, most of our examples will directly relate to EU directives and
legislation. The European Union has one of the strongest anti-discrimination legis-
lations (See, e.g., Directive 2000/43/EC, Directive 2000/78/EC/ Directive
2002/73/EC, Article 21 of the Charter of Fundamental Rights and Protocol
12/Article 14 of the European Convention on Human Rights), describing discrim-
ination on the basis of race, ethnicity, religion, nationality, gender, sexuality,
Search WWH ::




Custom Search