An Introduction to Frequent Pattern Mining - Frequent Pattern Mining

Database Reference

In-Depth Information

A particular scenario of interest is one in which the patterns to be mined are very

long. In such cases, the number of subsets of frequent patterns can be extremely

large. Therefore, a number of techniques need to be designed in order to mine very

long patterns. In such cases, a variety of methods are used to explore the long patterns

early, so that their subsets can be pruned effectively. The scenario of long pattern

generation is discussed in detail in Chap. 4, though it is also discussed to some extent

in the earlier Chaps. 2 and 3.

2.2

Interesting and Negative Frequent Patterns

A major challenge in frequent pattern mining is that the rules found may often not

be very interesting, when quantifications such as support and confidence are used.

This is because such quantifications do not normalize for the original frequency of

the underlying items. For example, an item that occurs very rarely in the underlying

database would naturally also occur in itemsets with lower frequency. Therefore, the

absolute frequency often does not tell us much about the likelihood of items to co-

occur together, because of the biases associated with the frequencies of the individual

items. Therefore, numerous methods have been proposed in the literature for finding

interesting frequent patterns that normalize for the underlying item frequencies [ 6 ,

26 ]. Methods for finding interesting frequent patterns are discussed in Chap. 5. The

issue of interestingness is also related to compressed representations of patterns such

as closed or maximal itemsets. These issues are also discussed in the chapter.

In negative associative rule mining, we attempt to determine rules such as

Bread

⇒¬

Butter , where the symbol

¬

indicates negation. Therefore, in this

case

Butter becomes a pseudo-item denoting a “negative item.” One possibility

is to add negative items to the data, and perform the mining in the same way as one

would determine rules in the support-confidence framework. However, this is not a

feasible solution. This is because traditional support frameworks are not designed

for cases where an item is presented in the data 98 % of the time. This is the case for

“negative items.” For example, most transactions may not contain the item Butter ,

and therefore even positively correlated items may appear as negative rules. For ex-

ample, the rule Bread

¬

Butter may have confidence greater than 50 %, even

though Bread is clearly correlated in a positive way with Butter . This is because,

the item

⇒¬

Butter may have an even higher support of 98 %.

The issue of finding negative patterns is closely related to that of finding interesting

patterns in the data [ 6 ] because one is looking for patterns that satisfy the support

requirement in an interesting way. This relationship between the two problems tends

to be under-emphasized in the literature, and the problem of negative pattern mining is

often treated independently from interesting pattern mining. Some frameworks, such

as collective strength, are designed to address both issues simultaneously. Methods

for negative pattern mining are addressed in Chap. 6. The relationship between

interesting pattern mining and negative pattern mining will be discussed in the same

chapter.

¬

Frequent Pattern Mining

Search WWH ::

Custom Search

Home