Information Technology Reference
In-Depth Information
some specified way in order to form a web. In the sociology literature an element is also
called a node or a vertex and the connecting line or arc is called an edge, a terminology
developed in the mathematics of graph theory that dates back 300 years to the polymath
Leonard Euler (1707-1784). The mathematicians Erdös and Rényi [ 20 ] in the 1950s
and 1960s applied this theory to the investigation of the properties of random webs and
determined that the probability that m vertices terminate on a given vertex is given by
a Poisson distribution. There are some excellent popular reviews [ 5 , 12 , 42 , 43 ]ofthe
Erdös-Rényi ground-breaking work on random networks and we review some of that
work in Chapter 6.
An interesting property of the Poisson distribution is that it has a single peak located
at the average value; in this regard it is very similar to the distribution of Gauss. In
fact, similarly to processes described by the normal distribution, those random webs
described by the Poisson distribution are dominated by averages. In a Poisson world, just
like in a Gaussian world, as Barabási [ 5 ] pointed out, most people would have roughly
the same number of acquaintances; most neurons would connect to the same typical
number of other neurons; most companies would trade with approximately the same
number of other companies; most web sites would be visited by the same number of
visitors, more or less. But the real world is not like that of Gauss or Poisson. Barabási [ 5 ]
and Watts [ 43 ] showed that average measures are not appropriate for characterizing real-
world networks, such as the Internet and the World Wide Web. West [ 45 ] showed that
average measures in medicine are equally inappropriate and a number of the physiologic
webs he considered are discussed in more detail later.
Figure 4.3 depicts the cumulative distribution in the number of connections to web
sites with data obtained from the number of links to web sites found in a 1997 web crawl
of about 200 million pages [ 11 ]. Clauset et al .[ 15 ] point out that this is a limited sample
of the much larger entire web with an inverse power-law index of approximately 2.34.
The exact value of the inverse power-law index should not be given much credence
because of the large fluctuations in the data, a property common in the tails of such
distributions. The best we can do here, without going into the details of data processing,
is to recognize the inverse power-law behavior of the phenomenon.
A related graph is given in Figure 4.4 , where the cumulative distribution in the num-
ber of hits received by web sites, to servers not to pages, from customers of the America
The cumulative distribution in the number of links to web sites is shown by the circles, and is
approximately described by an inverse power law with slope 2 . 34 [ 15 ]. Reproduced with
permission.
Figure 4.3.
Search WWH ::




Custom Search