Information Technology Reference
In-Depth Information
estimating the network structures and determining their relationship with the global
properties of the web sites.
In this the topic chapter we describe our method for creating ego-centric networks
for the application of viral marketing. To understand how well our method captures
the true underlying network, we use a well-known data set, the Stanford-Berkeley
network, to represent the 'true' network. We carry-out several experiments to mea-
sure if, on several key metrics, our technique provides good approximations with
minimal data collection required. The results indicate that the ego-centric network
approach captures the structure of the true network, increases in accuracy with more
data, and does better than a baseline approach. Next we describe the ego-centric ap-
proach, the experiments and results.
2
Ego-Centric Networks
Understanding the web and its properties has been a hot research topic since its
inception [17]. One of the biggest algorithmic challenges is that the size of the web is
too huge to grasp its complete picture, hence some sampling procedure is beneficial.
Given a huge web graph, how can we derive a representative sample that pre-
serves the major properties of the original graph? [6] proposed a specific random
walk on the (directed) web graph, however it is not clear how many steps are re-
quired in order to approximate the equilibrium distribution. [1] proposed asking
various search engines for the in-links of a given page in order to sample all adja-
cent edges of a given page. However, frequently only a subset of all in-links can be
found in this way. [8] developed the HITS algorithm, which queries an index-based
search engine (for example Google.com) to find web pages related to some query.
The resulting web pages are then expanded to include in-links to and out-links from
the web pages. Network metrics are then used to assign authority and give promi-
nence to more structurally important web pages.
From the social sciences, a related and interesting approach is snow-ball sam-
pling, see [12] for a recent, Internet-related discussion. In this technique, initial
seeds are recruited to report their immediate social network, who are then sub-
squently recruited. Related work has focused on developing sampling techniques
that estimate the true population size accurately and avoid biases, an example being
respondent-driven sampling [16].
Many known network sampling algorithms may be applied in our ego-centric
framework. The natural questions to ask here are 1) which sampling method to use,
2) how small of a sample size can be used and 3) how to scale up the measurements
of the sample in order to get the measurements of the whole graph, if needed. [5]
present a nice study on some of the sampling methods in terms of how good they are
in addressing the above questions. The methods fall into two categories, sampling
by random node/edge selection and sampling by exploration. The latter, like snow-
ball sampling, explores the nodes in a given node's vicinity, which fits into our ego
network framework. According to their study, one of the better performing methods,
if not always the best under different circumstances, is a sampling approach inspired
 
Search WWH ::




Custom Search