Information Technology Reference
In-Depth Information
by the work on temporal graph evolution, called Forest Fire sampling. It 'burns', or
keeps, outgoing links and the corresponding nodes with a certain probability. If
a link gets burned, the node at the other endpoint gets a chance to burn its own
links, and so on recursively building a network out of the initial node, burned edges
and their nodes. This model has two parameters: forward and backward burning
probabilities. We propose a very similar approach in the following section.
3
Methodology
Our ego network sampling approach is a variation of the Forest Fire algorithm. In
our algorithm, we set the backward burning probability to zero, as we only allow
a node one opportunity to obtain edges linking in to it, and use the Yahoo! Site
Explorer for in-link estimation. The following are the steps,
Given a web site, get the total number of in-links from Yahoo!. 1
We get 100 in-links for the web site from Yahoo! 2 .
We find the unique domains in the in-link list and calculate rate, defined as r
=
# o f U niques
/
100.
We randomly pick n links from the unique in-link list 3 ,where n is a geometric
random number with mean proportional to log
(
#in-links
r
)
.
Repeat the above steps for nodes of the burned links.
Stop when it burns R levels deep. We suggest using R
=
3.
We then repeat the above steps to get a total of three random ego networks for the
same web site. We assume that the network structure of the web site stays the same
within a short time period. The differences observed in the three networks are due
to the randomness of the sampling. The three networks are studied individually and
in combination.
Figure 1 shows samples of two generated ego networks. Each row displays three
sampled networks for a web site. The absolute position of the same nodes (web sites)
in each of these graphs is not fixed. That is, a web site may be represented in each
of the three ego-networks, but occur in different locations in the actual network
displayed. As we can see from the figure, the three generated networks of a web
site are different from each other, but the general patterns are preserved. The upper
networks show great interconnectivity, while the lower ones are more star-like.
1
Yahoo! treats a domain suffixed with and without “www.” as two different domains. Our
solution to this problem is to get the total number of in-links for both, and use the one with
the higher value to create the network.
2
The Site Explorer ranks the in-links in order of importance. It returns a maximum of 100
in-links. We use the in-links options of “Except from this domain” and “to Entire site” to
get an accurate picture of the external links.
3
By our observation, the Yahoo! In-link distribution is extremely heavy-tailed. We under-
stand that taking logarithms will change the degree distribution of the network, but it is the
most suitable approach to downsize the network.
 
Search WWH ::




Custom Search