Database Reference
In-Depth Information
shall call the left side. We assume that the instance of K s,t we are looking for has t nodes on
the left side, and we shall also assume for efficiency that t s . The “baskets” correspond to
the nodes on the other side of G (the right side). The members of the basket for node v are
the nodes of the left side to which v is connected. Finally, let the support threshold be s , the
number of nodes that the instance of K s,t has on the right side.
We can now state the problem of finding instances of K s,t as that of finding frequent
itemsets F of size t . That is, if a set of t nodes on the left side is frequent, then they all occur
together in at least s baskets. But the baskets are the nodes on the right side. Each basket
corresponds to a node that is connected to all t of the nodes in F . Thus, the frequent itemset
of size t and s of the baskets in which all those items appear form an instance of K s,t .
EXAMPLE 10.11 Recall the bipartite graph of Fig. 8.1 , which we repeat here as Fig. 10.10 .
The left side is the nodes {1 , 2 , 3 , 4} and the right side is { a, b, c, d }. The latter are the
baskets, so basket a consists of “items” 1 and 4; that is, a = {1 , 4}. Similarly, b = {2 , 3}, c
= {1} and d = {3}.
Figure 10.10 The bipartite graph from Fig. 8.1
If s = 2 and t = 1, we must find itemsets of size 1 that appear in at least two baskets. {1}
is one such itemset, and {3} is another. However, in this tiny example there are no itemsets
for larger, more interesting, values of s and t , such as s = t = 2.
10.3.4
Why Complete Bipartite Graphs Must Exist
We must now turn to the matter of demonstrating that any bipartite graph with a sufficiently
high fraction of the edges present will have an instance of K s,t . In what follows, assume that
the graph G has n nodes on the left and another n nodes on the right. Assume the two sides
have the same number of nodes simplifies the calculation, but the argument generalizes to
sides of any size. Finally, let d be the average degree of all nodes.
The argument involves counting the number of frequent itemsets of size t that a basket
with d items contributes to. When we sum this number over all nodes on the right side, we
get the total frequency of all the subsets of size t on the left. When we divide by
we
Search WWH ::




Custom Search