Interesting Patterns - Frequent Pattern Mining - page 107

Database Reference

In-Depth Information

abcd (1)

frequent

abc (2)

abd (3)

acd (1)

bcd (1)

abcd

ndi

⎡

⎢

⎢

⎢

⎢

⎣

⎤

⎥

⎥

⎥

⎦

1111

1110

1101

1000

1101

0010

ab (4)

ac (2)

bc (2)

ad (3)

bd (3)

cd (1)

free

a (5)

b (4)

c (2)

d (3)

∅

(6)

a

b

dataset

itemset lattice

Fig. 5.1 A dataset of 4 items and 6 transactions and the corresponding lattice. The lattice shows the

free, non-derivable, and, for minsup = 2, the frequent itemsets. Closed itemsets are highlighted

As an example, consider Fig. 5.1 , where we depict a toy dataset and the lattice of

all itemsets. Say we aim to mine all frequent itemsets with a minimal support of 2,

i.e., a minimum frequency of 2 / 6

1 / 3. The A Priori algorithm considers the lattice

level-wise, and first identifies the frequent singleton itemsets. Here, a , b , c , and d are

all frequent. It then constructs the candidate set by taking the Cartesian product with

the frequent singletons. In this example this is the full set of itemsets of cardinality 2,

i.e.,

=

. We calculate the support of all candidates, and find

that all itemsets, except cd , are frequent, i.e.,

C 2 ={

ab , ac , bc , ad , bd , cd

}

F 2 ={

ab , ac , bc , ad , bd

}

. Iterating to

the third level, we have

F 2 contain cd ,of

which we know it is not frequent, and hence neither will any larger itemset containing

it. We find that the two remaining candidates are frequent,

C 3 ={

abc , abd

}

, as all other extensions of

C 4 =∅

as there are no itemsets of size 4 that have all of their sub-itemsets of length 3 in

F 3 = C 3 . Finally,

F 3 .

Hence, the answer to the stated problem, the complete set of frequent itemsets for

minsup

.

The A Priori, or, perhaps more aptly named, level-wise algorithm can be applied

for any enumerable pattern language

=

2, is hence

F ={

a , b , c , ab , ac , bc , ad , bd , abc , abd

}

and monotonic interestingness measure q .

Soon after the discovery of the A Priori property, we see three major focal points

for further research. In particular, a lot of attention was given to investigating more

efficient algorithms for mining frequent itemsets [ 24 ] (see also Chaps. 2 and 3),

methods that can mine frequent patterns from data types other than binary (see

Chap. 11), and third, on further measures of interestingness (this chapter).

L

Pattern Explosion Now armed with the ability to mine frequent itemsets in prac-

tice, researchers quickly found that frequency is not quite the ideal interestingness

measure. That is, we find that for high support thresholds only find patterns represent-

ing common knowledge are discovered. However, when we lower the threshold, we

are typically quickly flooded with such enormous amounts of results that it becomes

impossible to inspect or use them. Moreover, the result set is highly redundant: very

many of the returned patterns are simply variations of each other. Combined, this

Next Page

Frequent Pattern Mining

Search WWH ::

Custom Search

Home