Classification: Basic Concepts - Data Mining: Concepts and Techniques

Databases Reference

In-Depth Information

case of probability values of zero. This technique for probability estimation is known as

the Laplacian correction or Laplace estimator , named after Pierre Laplace, a French

mathematician who lived from 1749 to 1827. If we have, say, q counts to which we each

add one, then we must remember to add q to the corresponding denominator used in

the probability calculation. We illustrate this technique in Example 8.5.

Example 8.5 Using the Laplacian correction to avoid computing probability values of zero. Sup-

pose that for the class buys computer D yes in some training database, D , containing

1000 tuples, we have 0 tuples with income D low , 990 tuples with income D medium , and

10 tuples with income D high . The probabilities of these events, without the Laplacian

correction, are 0, 0.990 (from 990/1000), and 0.010 (from 10/1000), respectively. Using

the Laplacian correction for the three quantities, we pretend that we have 1 more tuple

for each income-value pair. In this way, we instead obtain the following probabilities

(rounded up to three decimal places):

1

1003 D 0.001,

1003 D 0.988, and 11

991

1003 D 0.011,

respectively. The “corrected” probability estimates are close to their “uncorrected”

counterparts, yet the zero probability value is avoided.

8.4 Rule-Based Classification

In this section, we look at rule-based classifiers, where the learned model is represented

as a set of IF-THEN rules. We first examine how such rules are used for classification

(Section 8.4.1). We then study ways in which they can be generated, either from a deci-

sion tree (Section 8.4.2) or directly from the training data using a sequential covering

algorithm (Section 8.4.3).

8.4.1 Using IF-THEN Rules for Classification

Rules are a good way of representing information or bits of knowledge. A rule-based

classifier uses a set of IF-THEN rules for classification. An IF-THEN rule is an expres-

sion of the form

IF condition THEN conclusion .

An example is rule R 1,

R 1: IF age D youth AND student D yes THEN buys computer D yes .

The “IF” part (or left side) of a rule is known as the rule antecedent or precondition .

The “THEN” part (or right side) is the rule consequent . In the rule antecedent, the

condition consists of one or more attribute tests (e.g., age D youth and student D yes )

Search WWH ::

Custom Search

Home