Databases Reference
In-Depth Information
case of probability values of zero. This technique for probability estimation is known as
the Laplacian correction or Laplace estimator , named after Pierre Laplace, a French
mathematician who lived from 1749 to 1827. If we have, say, q counts to which we each
add one, then we must remember to add q to the corresponding denominator used in
the probability calculation. We illustrate this technique in Example 8.5.
Example 8.5 Using the Laplacian correction to avoid computing probability values of zero. Sup-
pose that for the class buys computer D yes in some training database, D , containing
1000 tuples, we have 0 tuples with income D low , 990 tuples with income D medium , and
10 tuples with income D high . The probabilities of these events, without the Laplacian
correction, are 0, 0.990 (from 990/1000), and 0.010 (from 10/1000), respectively. Using
the Laplacian correction for the three quantities, we pretend that we have 1 more tuple
for each income-value pair. In this way, we instead obtain the following probabilities
(rounded up to three decimal places):
1
1003 D 0.001,
1003 D 0.988, and 11
991
1003 D 0.011,
respectively. The “corrected” probability estimates are close to their “uncorrected”
counterparts, yet the zero probability value is avoided.
8.4 Rule-Based Classification
In this section, we look at rule-based classifiers, where the learned model is represented
as a set of IF-THEN rules. We first examine how such rules are used for classification
(Section 8.4.1). We then study ways in which they can be generated, either from a deci-
sion tree (Section 8.4.2) or directly from the training data using a sequential covering
algorithm (Section 8.4.3).
8.4.1 Using IF-THEN Rules for Classification
Rules are a good way of representing information or bits of knowledge. A rule-based
classifier uses a set of IF-THEN rules for classification. An IF-THEN rule is an expres-
sion of the form
IF condition THEN conclusion .
An example is rule R 1,
R 1: IF age D youth AND student D yes THEN buys computer D yes .
The “IF” part (or left side) of a rule is known as the rule antecedent or precondition .
The “THEN” part (or right side) is the rule consequent . In the rule antecedent, the
condition consists of one or more attribute tests (e.g., age D youth and student D yes )
 
Search WWH ::




Custom Search