Data Mining of Association Rules and the Process of Knowledge Discovery in Databases - Advances in Data Mining

Information Technology Reference

In-Depth Information

{a}

{b}

{c}

{d}

{e}

{a, b}

{a, c}

{a, d}

{a, e}

{b, c}

{b, d}

{b, e}

{c, d}

{c, e}

{d, e}

{a, b, c}

{a, b, d}

{a, b, e}

{a, c, d}

{a, c, e}

{a, d, e}

{b, c, d}

{b, c, e}

{b, d, e}

{c, d, e}

{a, b, c, d}

{a, b, c, e}

{a, b, d, e}

{a, c, d, e}

{b, c, d, e}

{a, b, c, d, e}

Fig. 2. The Search Space for I = {a, b, c, d, e}

as candidates. Then the algorithm should descend to the lower levels. This can

be done by breadth-first search or depth-first search. The border is a natural

barrier where the step-wise search space traversal stops. From the itemsets below

the border only the itemsets immediately at the border may be considered as

candidates. Looking at these itemsets is necessary to exactly identify the course

of the border.

There are two common approaches to actually determine the support of the

candidates. The first approach is to directly count sets of candidates. For each

candidate a counter is set up and the algorithm then passes over the complete

database of transactions. Whenever a transactions contains one of the candidates

its counter is incremented. E - ciently looking up candidates in transactions re-

quires specialized data structures, e.g. hashtrees or prefix trees, c.f. [3,6].

Alternatively the support values of candidates can be determined indirectly

by set intersections. For that purpose so called transaction sets are employed.

The transaction set X. tids of an itemset X is defined as the set of all transactions

this itemset is contained in:

X. tids = {T ∈D|X ⊆ T}.

For the support follows

supp( X )= |X. tids |

|D|

.

Determining the transaction sets for the frequent 1-itemsets is straight forward.

It simply requires to transform the database from so called horizontal layout

to vertical layout. We initialize an empty list to hold the transaction set for

each item. Then we pass over all transactions and for each item in the current

Advances in Data Mining

Search WWH ::

Custom Search

Home