Generality Is Predictive of Prediction Accuracy - Data Mining: Theory, Methodology, Techniques, and Applications

Database Reference

In-Depth Information

notion of other distinguishing evidence to allow for the real-world knowledge

acquisition context in which evidence other than that contained in the data

may be brought to bear upon the rule selection problem.

We present two hypotheses relating to classification rules

W → y

and

Z → y

learned from real-world data such that

W Z

and

NODE

(

W → y, Z → y

(

|ε

(

W → y, D

)

− ε

(

true → y, D

)

| < |ε

(

Z → y, D

)

− ε

(

true → y, D

)

(

|ε

(

W → y, D

)

−ε

(

true → y, D

)

| > |ε

(

Z → y, D

)

−ε

(

true → y, D

)

). That

is, the error of the more general rule,

, on unseen data will tend to be

closer to the proportion of cases in the domain that do not belong to class

W → y

than will the error of the more specific rule,

Z → y

W → y, D )

Z → y, D )

(

|ε

(

W → y, D

)

− ε

(

| > |ε

(

Z → y, D

)

− ε

(

)

W → y, D )

Z → y, D )

(

|ε

(

W → y, D

)

− ε

(

| < |ε

(

Z → y, D

)

− ε

(

). That

is, the error of the more specific rule,

, on unseen data will tend to be

closer to the proportion of negative training cases covered by the two rules 1

than will the error of the more general rule,

Z → y

W → y

Another way of stating these two hypotheses is that of two rules with identical

empirical and other support,

1. the more general can be expected to exhibit classification error closer to that

of a default rule ,

true → y

, or, in other words, of assuming all cases belong

to the class, and

2. the more specific can be expected to exhibit classification error closer to that

observed on the training data.

It is important to clarify at the outset that we are not claiming that the more

general rule will invariably have closer generalization error to the default rule

and the more specific rule will invariably have closer generalization error to the

observed error on the training data. Rather, we are claiming that relative gener-

ality provides a source of evidence that, in the absence of alternative evidence,

provides reasonable grounds for believing that each of these effects is more likely

than the contrary.

Observation. With simple assumptions, hypotheses (1) and (2) can be shown

to be trivially true given that

D and

are idd samples from a single finite

distribution

Proof.

1. For any rule

X → y

and test set

(

X → y, D

(

X → y, X

(

)), as

X → y

only covers instances

(

)of

)= E ( Z→y,Z ( D∩D ))+ E ( Z→y,Z ( D−D ))

|Z ( D ) |

(

Z → y, D

)= E ( W→y,W ( D∩D ))+ E ( W→y,W ( D−D ))

|W ( D ) |

(

W → y, D

(

)

⊆ W

(

) because

is a specialization of

Recall that both rules have identical empirical support and hence cover the same

training cases.

Data Mining: Theory, Methodology, Techniques, and Applications

Search WWH ::

Custom Search

Home