Introducing Positive Discrimination in Predictive Models - Discrimination and Privacy in the Information Society

Database Reference

In-Depth Information

of discrimination-aware data-mining in Section 3. Afterwards, we discuss our dis-

crimination-aware techniques applied to the Naive Bayes classifier in Section 4.

In Section 5, we discuss the effects of our techniques on positive discrimination.

Section 6 concludes the chapter.

14.2 The Naive Bayes Classifier

We already gave an intuitive introduction to the Naive Bayes classifier in the intro-

duction. In this section we will provide a more in-depth discussion of this classifier

introducing the necessary background for understanding the proposed adaptations to

the model to make it discrimination-free. The Naive Bayes classifier is a simple

probabilistic model that assumes independence between all attributes when given the

class attribute, see, e.g., (Bishop, 2006). For example, when predicting whether

someone has a high or low income (class attribute), the age of a person correlates

with the type of position (s)he occupies. A Naive Bayes classifier assumes that once

the income is known, these two attributes are independent. For instance, age no

longer correlates with position when considering only people with a high (low) in-

come. Formally, a Naive Bayes model computes the following probability function 1 :

P(C)P(A 1 |C)P(A 2 |C)…P(A n |C)

In this formula, C is the class attribute and A 1 ,A 2 ,…,A n are all other attributes.

P(C) is a probability function for the different class values, and P(A|C) is a proba-

bility function for A's attribute values given the class value. Due to the indepen-

dence assumption, the total probability function (or model) P(C,A 1 ,A 2 ,…,A n ) can

be computed simply by multiplying the individual probabilities of the class and of

each attribute given the class. We now show using an example how to estimate

these probability functions and use them as a classifier.

P(C,A 1 ,A 2 ,…,A n )

∝

Example

Suppose we are given a data-set consisting of 100 people, 40 of which are female

and 60 male. We would like to predict whether a new person is likely to have a

high or a low income based on this data. In the data-set 20 males and 10 females

have a high income. This results in the following probability functions:

P(high income) = 30/100 = 0.3, P(low income) = 0.7

P(male| high income) = 20/30 = 0.67, P(female|high income) = 10/30 = 0.33

P(male| low income) = 40/70 = 0.57, P(female|low income) = 30/70 = 0.43

In addition, suppose we also know the education of these people and that this

attribute results in the following probability functions:

P(university|high) = 0.5, P(high school|high) = 0.33, P(none|high) = 0.17

P(university|low) = 0.07, P(high school|low) = 0.57, P(none|low.) = 0.36

1 We disregard normalizing constants. Note that this formulation is consistent with the one

used in the introduction, as we can easily move from comparing products to sums via the

logarithm.

Search WWH ::

Custom Search

Home