Database Reference
In-Depth Information
using the triples method. That is, some pairs would have very high counts, and the number
of different pairs that occurred in one or more baskets would be much less than the theor-
etical maximum number of such pairs.
6.2.3
Monotonicity of Itemsets
Much of the effectiveness of the algorithms we shall discuss is driven by a single observa-
tion, called monotonicity for itemsets:
• If a set I of items is frequent, then so is every subset of I .
The reason is simple. Let J I . Then every basket that contains all the items in I surely
contains all the items in J . Thus, the count for J must be at least as great as the count for I ,
and if the count for I is at least s , then the count for J is at least s . Since J may be contained
in some baskets that are missing one or more elements of I J , it is entirely possible that
the count for J is strictly greater than the count for I .
In addition to making the A-Priori Algorithm work, monotonicity offers us a way to
compact the information about frequent itemsets. If we are given a support threshold s , then
we say an itemset is maximal if no superset is frequent. If we list only the maximal item-
sets, then we know that all subsets of a maximal itemset are frequent, and no set that is not
a subset of some maximal itemset can be frequent.
EXAMPLE 6.7 Let us reconsider the data of Example 6.1 with support threshold s = 3. We
found that there were five frequent singletons, those with words “cat,” “dog,” “a,” “and,”
and “training.” Each of these is contained in a frequent doubleton, except for “training,” so
one maximal frequent itemset is {training}. There are also four frequent doubletons with s
= 3, namely
{dog, a}
{dog, and}
{dog, cat}
{cat, and}
As there are no frequent triples for support threshold 3, these are all maximal frequent item-
sets, and there are no others. Notice that we can deduce from the frequent doubletons that
singletons like {dog} are frequent.
Search WWH ::




Custom Search