Mining Social-Network Graphs - Mining of Massive Datasets

Database Reference

In-Depth Information

shall call the left side. We assume that the instance of K s,t we are looking for has t nodes on

the left side, and we shall also assume for efficiency that t ≤ s . The “baskets” correspond to

the nodes on the other side of G (the right side). The members of the basket for node v are

the nodes of the left side to which v is connected. Finally, let the support threshold be s , the

number of nodes that the instance of K s,t has on the right side.

We can now state the problem of finding instances of K s,t as that of finding frequent

itemsets F of size t . That is, if a set of t nodes on the left side is frequent, then they all occur

together in at least s baskets. But the baskets are the nodes on the right side. Each basket

corresponds to a node that is connected to all t of the nodes in F . Thus, the frequent itemset

of size t and s of the baskets in which all those items appear form an instance of K s,t .

EXAMPLE 10.11 Recall the bipartite graph of Fig. 8.1 , which we repeat here as Fig. 10.10 .

The left side is the nodes {1 , 2 , 3 , 4} and the right side is { a, b, c, d }. The latter are the

baskets, so basket a consists of “items” 1 and 4; that is, a = {1 , 4}. Similarly, b = {2 , 3}, c

= {1} and d = {3}.

Figure 10.10 The bipartite graph from Fig. 8.1

If s = 2 and t = 1, we must find itemsets of size 1 that appear in at least two baskets. {1}

is one such itemset, and {3} is another. However, in this tiny example there are no itemsets

for larger, more interesting, values of s and t , such as s = t = 2.

□

10.3.4

Why Complete Bipartite Graphs Must Exist

We must now turn to the matter of demonstrating that any bipartite graph with a sufficiently

high fraction of the edges present will have an instance of K s,t . In what follows, assume that

the graph G has n nodes on the left and another n nodes on the right. Assume the two sides

have the same number of nodes simplifies the calculation, but the argument generalizes to

sides of any size. Finally, let d be the average degree of all nodes.

The argument involves counting the number of frequent itemsets of size t that a basket

with d items contributes to. When we sum this number over all nodes on the right side, we

get the total frequency of all the subsets of size t on the left. When we divide by

we

Search WWH ::

Custom Search

Home