Database Reference
In-Depth Information
get the average frequency of all itemsets of size
t
. At least one must have a frequency that
is at least average, so if this average is at least
s
, we know an instance of
K
s,t
exists.
Now we provide the detailed calculation. Suppose the degree of the
i
th node on the right
is
d
i
; that is,
d
i
is the size of the
i
th basket. Then this basket contributes to itemsets
of size
t
. The total contribution of the
n
nodes on the right is The value of this
sum depends on the
d
i
, of course. However, we know that the average value of
d
i
is
d
. It
is known that this sum is minimized when each
d
i
is
d
. We shall not prove this point, but a
simple example will suggest the reasoning; since grows roughly as the
t
th power of
d
i
,
moving 1 from a large
d
i
to some smaller
d
j
will reduce the sum of
EXAMPLE
10.12 Suppose there are only two nodes,
t
= 2, and the average degree of the
nodes is 4. Then
d
1
+
d
= 8, and the sum of interest is
If
d
1
=
d
2
= 4, then the sum is
However, if
d
1
= 5 and
d
2
= 3, the sum is
= 10 + 3 = 13. If
d
1
= 6 and
d
1
= 2,
then the sum is
= 15 + 1 = 16.
□
Thus, in what follows, we shall assume that all nodes have the average degree
d
. So do-
ing minimizes the total contribution to the counts for the itemsets, and thus makes it least
likely that there will be a frequent itemset (itemset with with support
s
or more) of size
t
.
Observe the following:
• The total contribution of the
n
nodes on the right to the counts of the itemsets of
size
t
is
• The number of itemsets of size
t
is
• Thus, the average count of an itemset of size
t
is
this expression must be at
least
s
if we are to argue that an instance of
K
s,t
exists.
If we expand the binomial coefficients in terms of factorials, we find
To simplify the formula above, let us assume that
n
is much larger than
d
, and
d
is much
larger than
t
. Then
d
(
d
− 1) · · · (
d
−
t
+ 1) is approximately
d
t
, and
n
(
n
− 1) · · · (
n
−
t
+ 1)
is approximately
n
t
. We thus require that
n
(
d/n
)
t
≥
s
That is, if there is a community with
n
nodes on each side, the average degree of the nodes
is
d
, and
n
(
d/n
)
t
≥
s
, then this community is guaranteed to have a complete bipartite sub-
graph
K
s,t
. Moreover, we can find the instance of
K
s,t
efficiently, using the methods of