Data Mining of Association Rules and the Process of Knowledge Discovery in Databases - Advances in Data Mining

Information Technology Reference

In-Depth Information

on correlations and do not necessarily imply causation. To further illustrate this

point let us look at the following example:

Let P ( X ) be the probability that a transaction T from database D contains

the itemset X . Let P ( X, Y ) be the probability that both X and Y are contained

in T ∈D . Nowlet X and Y be stochastically independent:

P ( X ) · P ( Y )= P ( X, Y ) .

Then for the confidence of the rule X → Y follows

conf( X → Y )= P ( Y ) .

This simple observation shows a severe shortcoming of the support-confidence

framework. As soon as the itemset Y occurs comparably often in the data the

rule X → Y also has a high confidence value. This suggests a dependency of Y

from X although in fact both itemsets are stochastically independent. To cope

with this problem additional rule quality measures have been developed.

Lift (Interest) [7,19]

lift( X → Y )= conf( X → Y )

P ( Y )

= conf( X → Y )

supp( Y )

Lift directly addresses the above problem by expressing the deviation of the

rule confidence from P ( Y ). In the case of stochastic independence lift = 1 holds

true. In contrast, a value higher than 1 means that the existence of X as part

of a transaction “lifts” the probability for this transaction to also contain Y by

factor lift. The opposite is true for lift values lower than one. lift is symmetric

and therefore is an undirected measure.

Conviction [7]

conv( X → Y )= P ( X ) P ( ¬Y )

P ( X, ¬Y )

Let P ( ¬Y ) be the probability of a transaction T ∈D with Y T and P ( X, ¬Y )

the probability of drawing a transaction out of D that contains X but not Y .

conv( X → Y ) nowexpresses in howfar X and ¬Y are stochastically indepen-

dent. High values for conv( X → Y )-upto ∞ where P ( X, ¬Y ) = 0 - express

the conviction that this rule represents a causation. It is important to note that

conv is not symmetric and therefore is a directed measure.

3 The Process of Knowledge Discovery

Practical experiences showed that discovering knowledge from huge databases

affords much more than simply applying a sophisticated data mining algorithm

to a predefined dataset. In fact, people from research and practice more and

more understand knowledge discovery in databases (KDD) as

Search WWH ::

Custom Search

Home