Generality Is Predictive of Prediction Accuracy - Data Mining: Theory, Methodology, Techniques, and Applications

Database Reference

In-Depth Information

Table 1. Algorithm for generating a random rule

1. Randomly select an example x from the training set.

2. Randomly select an attribute a for which the value of a for x ( a x )isnot unknown .

3. If a is categorical, form the rule IF a = a x THEN c ,where c is the most frequent

class in the cases covered by a = a x .

4. Otherwise (if

, where # is a random

selection between ≤ and ≥ and c is the most frequent class in the cases covered

by a # a x .

is ordinal), form the rule

IF a

a x THEN c

as well as whether they apply to the type of rule generated in standard machine

learning applications. We used rules generated by C4.5rules (release 8) [9], as an

exemplar of a machine learning system for classification rule generation.

One diculty with employing rules formed by C4.5rules is that the system

uses a complex resolution system to determine which of several rules should be

employed to classify a case covered by more than one rule. As this is taken into

account during the induction process, taking a rule at random and considering

it in isolation may not be representative of its application in practice. We de-

termined that the first listed rule was least affected by this process, and hence

employed it. However, this caused a diculty in that the first listed rule usually

covers few training cases and hence estimates of its likely test error can be ex-

pected to have low accuracy, reducing the likely strength of the effect predicted

by Hypothesis 2.

For this reason we also employed the C4.5rules rule with the highest cover on

the training set. We recognized that this would be unrepresentative of the rule's

actual deployment, as in practice cases that it covered would frequently be clas-

sified by the ruleset as belonging to other classes. Nonetheless, we believed that

it provided an interesting exemplar of a form of rule employed in data mining.

To explore the wider scope of the hypotheses we also generated random rules

using the algorithm in Table 1.

From the initial rule , formed by one of these three processes, we developed a

most specific rule . The most specific rule was created by collecting all training

cases covered by the initial rule and then forming the most specific rule that

covered those cases. For a categorical attribute

this rule included a clause

a ∈ X

is the set of values for the attribute of cases in the random

selection. For ordinal attributes, the rule included a clause of the form

,where

x ≤ a ≤ z

where

is the lowest value and

the highest value for the attribute in the random

sample.

Next we found the set of all most general rules —those rules

formed by

deleting clauses from the most specific rule

such that

cover

(

cover

(

)

and there is no rule

that can be formed by deleting a clause from

such that

cover

). The search for the set of most general rules was performed

using the OPUS complete search algorithm [10].

Then we formed the:

(

cover

(

Random Most General Rule: a single rule selected at random from the most

general rules.

Data Mining: Theory, Methodology, Techniques, and Applications

Search WWH ::

Custom Search

Home