Database Reference
In-Depth Information
positive itemsets are discovered, candidate negative itemsets are considered based
on the taxonomy used. They are considered interesting if their support is sufficiently
different than the expected support. Association rules are generated from the negative
itemsets if the interestingness measure of the rule exceeds a given threshold. The type
of the rules discovered with this method are implications of the form A
B . The
issue with this approach is that it is hard to generalize since it is domain dependant
and requires a predefined taxonomy. However, it should be noted that taxonomies
exist for certain applications, thus making this method useful. A similar approach is
described in [ 25 ].
Wu et al. [ 24 ] derived another algorithm for generating both positive and nega-
tive association rules. The negative association discovered in this paper are of the
following forms:
⇒¬
Y . They add on top of the
support-confidence framework another measure called mininterest for a better prun-
ing of the frequent itemsets generated (the argument is that a rule A
¬
X
Y , X
⇒¬
Y and
¬
X
⇒¬
B is of
interest only if supp ( A
mininterest ). “Mininterest”
parameter is used to assess the dependency between the two itemsets considered,
A and B are not independent if they satisfy the condition. The authors consider as
itemsets of interest those itemsets that exceed minimum support and minimum inter-
est thresholds. Although [ 24 ] introduces the “mininterest” parameter, the authors do
not discuss how to set it and what would be the impact on the results when changing
this parameter.
The algorithm proposed in [ 20 , 21 ], named SRM (substitution rule mining), dis-
covers a subset of negative associations. The authors develop an algorithm to discover
negative associations of the type X ⇒¬ Y . These association rules can be used to
discover which items are substitutes for others in market basket analysis. Their al-
gorithm discovers first what they call concrete items , which are those itemsets that
have a high chi-square value and exceed the expected support. Once these itemsets
are discovered, they compute the correlation coefficient for each pair of them. From
those pairs that are negatively correlated, they extract the desired rules (of the type
X
B )
supp ( A ) supp ( B )
Y , where Y is considered as an atomic item). Although interesting for the
substitution items application, SRM is limited in the kind of rules that it can discover.
Antonie and Zaïane [ 7 ] proposed an algorithm to mine strong positive and negative
association rules based on the Person's φ correlation coefficient. For the association
rule X
⇒¬
Y , its φ correlation coefficient is as follows:
¬
¬
¬
¬
s ( XY ) s (
X
Y )
s ( X
Y ) s (
XY )
=
( s ( X ) s (
φ
(6.2)
¬
X ) s ( Y ) s (
¬
Y ))
In their algorithm, itemset and rule generation are combined and the relevant rules are
generated on-the-fly while analyzing the correlations within each candidate itemset.
This avoids evaluating item combinations redundantly. For each generated candidate
itemset, all possible combinations of items are computed to analyze their correla-
tions. In the end, only those rules generated from item combinations with strong
correlations are considered. The strength of the correlation is indicated by a cor-
relation threshold, either given as input or by default set to 0.5. If the correlation
between item combinations X and Y of an itemset XY , where X and Y are itemsets,
Search WWH ::




Custom Search