Frequent Pattern Mining Algorithms for Data Clustering - Frequent Pattern Mining - page 400

Database Reference

In-Depth Information

ABCD

ABCD

ABC

ABD

ACD

BCD

ABC

ACD

BCD

AB

AC

AC

BC

BC

AD

BD

CD

CD

A

B

C

C

D

{}

Fig. 16.3 Pruned search space during iterative database scans of Apriori (example): itemset { C }

has been found infrequent in the first scan, therefore, itemsets { A , C } , { B , C } , { C , D } do not need

to be considered in the second scan, itemsets { A , B , C } , { A , C , D } , { B , C , D } do not need to be

considered in the third scan, etc. In this example, Apriori stops scanning the database after round

three, as there is no candidate of length 4 remaining

More precisely, the pruning criterion used in the Apriori algorithm is based on the

equivalent anti-monotonic property, describing the opposite direction of deduction:

S

is not frequent

⇒∀ T ⊇ S

:

T

cannot be frequent either.

(16.2)

In the iterative procedure of repeated scans of the database for frequent itemsets,

this anti-monotonic property allows to ignore candidates that cannot be frequent

and, eventually, this pruning allows stopping at a certain size of itemsets, when

no candidates of typically moderate size remain to generate larger itemsets (see

Fig. 16.3 ).

An extension of the Apriori idea for very large itemsets has been termed 'colossal

patterns' [ 82 ]. The observation is that if one is interested in finding very large frequent

itemsets, then Apriori needs to generate many smaller frequent itemsets that are not

relevant for the result. This effect can be used positively, in that if large patterns also

have a large number of subsets, several of these subsets can be combined in order to

obtain larger candidates directly. In this sense, the idea is to avoid the full search, and

instead use some results at the bottom of the search space as a shortcut to particularly

promising candidates higher up. This approach thus trades some of the accuracy of

full search for a much more efficient frequent pattern mining algorithm. As we will

see below, both the Apriori algorithm, as well as that of colossal patterns have been

employed towards mining subspace clusters.

2.1

Generalized Monotonicity

In data clustering, we typically do not consider binary transaction data, or discrete

data in general, but instead most often study continuous real-valued vector data, typ-

ically assuming a Euclidean vector space. In this space, attributes may be noisy, or

Next Page

Frequent Pattern Mining

Search WWH ::

Custom Search

Home