Frequent Itemsets - Mining of Massive Datasets

Database Reference

In-Depth Information

using the triples method. That is, some pairs would have very high counts, and the number

of different pairs that occurred in one or more baskets would be much less than the theor-

etical maximum number of such pairs.

□

6.2.3

Monotonicity of Itemsets

Much of the effectiveness of the algorithms we shall discuss is driven by a single observa-

tion, called monotonicity for itemsets:

• If a set I of items is frequent, then so is every subset of I .

The reason is simple. Let J ⊆ I . Then every basket that contains all the items in I surely

contains all the items in J . Thus, the count for J must be at least as great as the count for I ,

and if the count for I is at least s , then the count for J is at least s . Since J may be contained

in some baskets that are missing one or more elements of I − J , it is entirely possible that

the count for J is strictly greater than the count for I .

In addition to making the A-Priori Algorithm work, monotonicity offers us a way to

compact the information about frequent itemsets. If we are given a support threshold s , then

we say an itemset is maximal if no superset is frequent. If we list only the maximal item-

sets, then we know that all subsets of a maximal itemset are frequent, and no set that is not

a subset of some maximal itemset can be frequent.

EXAMPLE 6.7 Let us reconsider the data of Example 6.1 with support threshold s = 3. We

found that there were five frequent singletons, those with words “cat,” “dog,” “a,” “and,”

and “training.” Each of these is contained in a frequent doubleton, except for “training,” so

one maximal frequent itemset is {training}. There are also four frequent doubletons with s

= 3, namely

{dog, a}

{dog, and}

{dog, cat}

{cat, and}

As there are no frequent triples for support threshold 3, these are all maximal frequent item-

sets, and there are no others. Notice that we can deduce from the frequent doubletons that

singletons like {dog} are frequent.

□

Search WWH ::

Custom Search

Home