Database Reference
In-Depth Information
of discrimination-aware data-mining in Section 3. Afterwards, we discuss our dis-
crimination-aware techniques applied to the Naive Bayes classifier in Section 4.
In Section 5, we discuss the effects of our techniques on positive discrimination.
Section 6 concludes the chapter.
14.2 The Naive Bayes Classifier
We already gave an intuitive introduction to the Naive Bayes classifier in the intro-
duction. In this section we will provide a more in-depth discussion of this classifier
introducing the necessary background for understanding the proposed adaptations to
the model to make it discrimination-free. The Naive Bayes classifier is a simple
probabilistic model that assumes independence between all attributes when given the
class attribute, see, e.g., (Bishop, 2006). For example, when predicting whether
someone has a high or low income (class attribute), the age of a person correlates
with the type of position (s)he occupies. A Naive Bayes classifier assumes that once
the income is known, these two attributes are independent. For instance, age no
longer correlates with position when considering only people with a high (low) in-
come. Formally, a Naive Bayes model computes the following probability function 1 :
P(C)P(A 1 |C)P(A 2 |C)…P(A n |C)
In this formula, C is the class attribute and A 1 ,A 2 ,…,A n are all other attributes.
P(C) is a probability function for the different class values, and P(A|C) is a proba-
bility function for A's attribute values given the class value. Due to the indepen-
dence assumption, the total probability function (or model) P(C,A 1 ,A 2 ,…,A n ) can
be computed simply by multiplying the individual probabilities of the class and of
each attribute given the class. We now show using an example how to estimate
these probability functions and use them as a classifier.
P(C,A 1 ,A 2 ,…,A n )
Example
Suppose we are given a data-set consisting of 100 people, 40 of which are female
and 60 male. We would like to predict whether a new person is likely to have a
high or a low income based on this data. In the data-set 20 males and 10 females
have a high income. This results in the following probability functions:
P(high income) = 30/100 = 0.3, P(low income) = 0.7
P(male| high income) = 20/30 = 0.67, P(female|high income) = 10/30 = 0.33
P(male| low income) = 40/70 = 0.57, P(female|low income) = 30/70 = 0.43
In addition, suppose we also know the education of these people and that this
attribute results in the following probability functions:
P(university|high) = 0.5, P(high school|high) = 0.33, P(none|high) = 0.17
P(university|low) = 0.07, P(high school|low) = 0.57, P(none|low.) = 0.36
1 We disregard normalizing constants. Note that this formulation is consistent with the one
used in the introduction, as we can easily move from comparing products to sums via the
logarithm.
Search WWH ::




Custom Search