Database Reference
In-Depth Information
A particular scenario of interest is one in which the patterns to be mined are very
long. In such cases, the number of subsets of frequent patterns can be extremely
large. Therefore, a number of techniques need to be designed in order to mine very
long patterns. In such cases, a variety of methods are used to explore the long patterns
early, so that their subsets can be pruned effectively. The scenario of long pattern
generation is discussed in detail in Chap. 4, though it is also discussed to some extent
in the earlier Chaps. 2 and 3.
2.2
Interesting and Negative Frequent Patterns
A major challenge in frequent pattern mining is that the rules found may often not
be very interesting, when quantifications such as support and confidence are used.
This is because such quantifications do not normalize for the original frequency of
the underlying items. For example, an item that occurs very rarely in the underlying
database would naturally also occur in itemsets with lower frequency. Therefore, the
absolute frequency often does not tell us much about the likelihood of items to co-
occur together, because of the biases associated with the frequencies of the individual
items. Therefore, numerous methods have been proposed in the literature for finding
interesting frequent patterns that normalize for the underlying item frequencies [ 6 ,
26 ]. Methods for finding interesting frequent patterns are discussed in Chap. 5. The
issue of interestingness is also related to compressed representations of patterns such
as closed or maximal itemsets. These issues are also discussed in the chapter.
In negative associative rule mining, we attempt to determine rules such as
Bread
⇒¬
Butter , where the symbol
¬
indicates negation. Therefore, in this
case
Butter becomes a pseudo-item denoting a “negative item.” One possibility
is to add negative items to the data, and perform the mining in the same way as one
would determine rules in the support-confidence framework. However, this is not a
feasible solution. This is because traditional support frameworks are not designed
for cases where an item is presented in the data 98 % of the time. This is the case for
“negative items.” For example, most transactions may not contain the item Butter ,
and therefore even positively correlated items may appear as negative rules. For ex-
ample, the rule Bread
¬
Butter may have confidence greater than 50 %, even
though Bread is clearly correlated in a positive way with Butter . This is because,
the item
⇒¬
Butter may have an even higher support of 98 %.
The issue of finding negative patterns is closely related to that of finding interesting
patterns in the data [ 6 ] because one is looking for patterns that satisfy the support
requirement in an interesting way. This relationship between the two problems tends
to be under-emphasized in the literature, and the problem of negative pattern mining is
often treated independently from interesting pattern mining. Some frameworks, such
as collective strength, are designed to address both issues simultaneously. Methods
for negative pattern mining are addressed in Chap. 6. The relationship between
interesting pattern mining and negative pattern mining will be discussed in the same
chapter.
¬
Search WWH ::




Custom Search