Information Technology Reference
In-Depth Information
{a}
{b}
{c}
{d}
{e}
{a, b}
{a, c}
{a, d}
{a, e}
{b, c}
{b, d}
{b, e}
{c, d}
{c, e}
{d, e}
{a, b, c}
{a, b, d}
{a, b, e}
{a, c, d}
{a, c, e}
{a, d, e}
{b, c, d}
{b, c, e}
{b, d, e}
{c, d, e}
{a, b, c, d}
{a, b, c, e}
{a, b, d, e}
{a, c, d, e}
{b, c, d, e}
{a, b, c, d, e}
Fig. 2. The Search Space for I = {a, b, c, d, e}
as candidates. Then the algorithm should descend to the lower levels. This can
be done by breadth-first search or depth-first search. The border is a natural
barrier where the step-wise search space traversal stops. From the itemsets below
the border only the itemsets immediately at the border may be considered as
candidates. Looking at these itemsets is necessary to exactly identify the course
of the border.
There are two common approaches to actually determine the support of the
candidates. The first approach is to directly count sets of candidates. For each
candidate a counter is set up and the algorithm then passes over the complete
database of transactions. Whenever a transactions contains one of the candidates
its counter is incremented. E - ciently looking up candidates in transactions re-
quires specialized data structures, e.g. hashtrees or prefix trees, c.f. [3,6].
Alternatively the support values of candidates can be determined indirectly
by set intersections. For that purpose so called transaction sets are employed.
The transaction set X. tids of an itemset X is defined as the set of all transactions
this itemset is contained in:
X. tids = {T ∈D|X ⊆ T}.
For the support follows
supp( X )= |X. tids |
|D|
.
Determining the transaction sets for the frequent 1-itemsets is straight forward.
It simply requires to transform the database from so called horizontal layout
to vertical layout. We initialize an empty list to hold the transaction set for
each item. Then we pass over all transactions and for each item in the current
Search WWH ::




Custom Search