Database Reference
In-Depth Information
shall call the
left
side. We assume that the instance of
K
s,t
we are looking for has
t
nodes on
the left side, and we shall also assume for efficiency that
t
≤
s
. The “baskets” correspond to
the nodes on the other side of
G
(the
right
side). The members of the basket for node
v
are
the nodes of the left side to which
v
is connected. Finally, let the support threshold be
s
, the
number of nodes that the instance of
K
s,t
has on the right side.
We can now state the problem of finding instances of
K
s,t
as that of finding frequent
itemsets
F
of size
t
. That is, if a set of
t
nodes on the left side is frequent, then they all occur
together in at least
s
baskets. But the baskets are the nodes on the right side. Each basket
corresponds to a node that is connected to all
t
of the nodes in
F
. Thus, the frequent itemset
of size
t
and
s
of the baskets in which all those items appear form an instance of
K
s,t
.
EXAMPLE
10.11 Recall the bipartite graph of
Fig. 8.1
, which we repeat here as
Fig. 10.10
.
The left side is the nodes {1
,
2
,
3
,
4} and the right side is {
a, b, c, d
}. The latter are the
baskets, so basket
a
consists of “items” 1 and 4; that is,
a
= {1
,
4}. Similarly,
b
= {2
,
3},
c
= {1} and
d
= {3}.
Figure 10.10
The bipartite graph from
Fig. 8.1
If
s
= 2 and
t
= 1, we must find itemsets of size 1 that appear in at least two baskets. {1}
is one such itemset, and {3} is another. However, in this tiny example there are no itemsets
for larger, more interesting, values of
s
and
t
, such as
s
=
t
= 2.
□
10.3.4
Why Complete Bipartite Graphs Must Exist
We must now turn to the matter of demonstrating that any bipartite graph with a sufficiently
high fraction of the edges present will have an instance of
K
s,t
. In what follows, assume that
the graph
G
has
n
nodes on the left and another
n
nodes on the right. Assume the two sides
have the same number of nodes simplifies the calculation, but the argument generalizes to
sides of any size. Finally, let
d
be the average degree of all nodes.
The argument involves counting the number of frequent itemsets of size
t
that a basket
with
d
items contributes to. When we sum this number over all nodes on the right side, we
get the total frequency of all the subsets of size
t
on the left. When we divide by
we