Link Analysis - Mining of Massive Datasets

Database Reference

In-Depth Information

its predecessors have PageRanks computed. These predecessors are A and D . In Fig. 5.4 , A

has three successors, so it contributes 1/3 of its PageRank to C . Page D has two successors

in Fig. 5.4 , so it contributes half its PageRank to C . Thus, the PageRank of C is

Now we can compute the PageRank for E . That node has only one predecessor, C , and

C has only one successor. Thus, the PageRank of E is the same as that of C . Note that

the sums of the PageRanks exceed 1, and they no longer represent the distribution of a

random surfer. Yet they do represent decent estimates of the relative importance of the

pages.

□

5.1.5

Spider Traps and Taxation

As we mentioned, a spider trap is a set of nodes with no dead ends but no arcs out.

These structures can appear intentionally or unintentionally on the Web, and they cause the

PageRank calculation to place all the PageRank within the spider traps.

EXAMPLE 5.5 Consider Fig. 5.6 , which is Fig. 5.1 with the arc out of C changed to point to

C itself. That change makes C a simple spider trap of one node. Note that in general spider

traps can have many nodes, and as we shall see in Section 5.4 , there are spider traps with

millions of nodes that spammers construct intentionally.

Figure 5.6 A graph with a one-node spider trap

The transition matrix for Fig. 5.6 is

If we perform the usual iteration to compute the PageRank of the nodes, we get

As predicted, all the PageRank is at C , since once there a random surfer can never

leave.

□

Search WWH ::

Custom Search

Home