Information Technology Reference
In-Depth Information
on correlations and do not necessarily imply causation. To further illustrate this
point let us look at the following example:
Let P ( X ) be the probability that a transaction T from database D contains
the itemset X . Let P ( X, Y ) be the probability that both X and Y are contained
in T ∈D . Nowlet X and Y be stochastically independent:
P ( X ) · P ( Y )= P ( X, Y ) .
Then for the confidence of the rule X → Y follows
conf( X → Y )= P ( Y ) .
This simple observation shows a severe shortcoming of the support-confidence
framework. As soon as the itemset Y occurs comparably often in the data the
rule X → Y also has a high confidence value. This suggests a dependency of Y
from X although in fact both itemsets are stochastically independent. To cope
with this problem additional rule quality measures have been developed.
Lift (Interest) [7,19]
lift( X → Y )= conf( X → Y )
P ( Y )
= conf( X → Y )
supp( Y )
Lift directly addresses the above problem by expressing the deviation of the
rule confidence from P ( Y ). In the case of stochastic independence lift = 1 holds
true. In contrast, a value higher than 1 means that the existence of X as part
of a transaction “lifts” the probability for this transaction to also contain Y by
factor lift. The opposite is true for lift values lower than one. lift is symmetric
and therefore is an undirected measure.
Conviction [7]
conv( X → Y )= P ( X ) P ( ¬Y )
P ( X, ¬Y )
Let P ( ¬Y ) be the probability of a transaction T ∈D with Y T and P ( X, ¬Y )
the probability of drawing a transaction out of D that contains X but not Y .
conv( X → Y ) nowexpresses in howfar X and ¬Y are stochastically indepen-
dent. High values for conv( X → Y )-upto where P ( X, ¬Y ) = 0 - express
the conviction that this rule represents a causation. It is important to note that
conv is not symmetric and therefore is a directed measure.
3 The Process of Knowledge Discovery
Practical experiences showed that discovering knowledge from huge databases
affords much more than simply applying a sophisticated data mining algorithm
to a predefined dataset. In fact, people from research and practice more and
more understand knowledge discovery in databases (KDD) as
Search WWH ::




Custom Search