Database Reference
In-Depth Information
get the average frequency of all itemsets of size t . At least one must have a frequency that
is at least average, so if this average is at least s , we know an instance of K s,t exists.
Now we provide the detailed calculation. Suppose the degree of the i th node on the right
is d i ; that is, d i is the size of the i th basket. Then this basket contributes to itemsets
of size t . The total contribution of the n nodes on the right is The value of this
sum depends on the d i , of course. However, we know that the average value of d i is d . It
is known that this sum is minimized when each d i is d . We shall not prove this point, but a
simple example will suggest the reasoning; since grows roughly as the t th power of d i ,
moving 1 from a large d i to some smaller d j will reduce the sum of
EXAMPLE 10.12 Suppose there are only two nodes, t = 2, and the average degree of the
nodes is 4. Then d 1 + d = 8, and the sum of interest is
If d 1 = d 2 = 4, then the sum is
However, if d 1 = 5 and d 2 = 3, the sum is
= 10 + 3 = 13. If d 1 = 6 and d 1 = 2,
then the sum is
= 15 + 1 = 16.
Thus, in what follows, we shall assume that all nodes have the average degree d . So do-
ing minimizes the total contribution to the counts for the itemsets, and thus makes it least
likely that there will be a frequent itemset (itemset with with support s or more) of size t .
Observe the following:
• The total contribution of the n nodes on the right to the counts of the itemsets of
size t is
• The number of itemsets of size t is
• Thus, the average count of an itemset of size t is
this expression must be at
least s if we are to argue that an instance of K s,t exists.
If we expand the binomial coefficients in terms of factorials, we find
To simplify the formula above, let us assume that n is much larger than d , and d is much
larger than t . Then d ( d − 1) · · · ( d t + 1) is approximately d t , and n ( n − 1) · · · ( n t + 1)
is approximately n t . We thus require that
n ( d/n ) t s
That is, if there is a community with n nodes on each side, the average degree of the nodes
is d , and n ( d/n ) t s , then this community is guaranteed to have a complete bipartite sub-
graph K s,t . Moreover, we can find the instance of K s,t efficiently, using the methods of
Search WWH ::




Custom Search