Webs, trees and branches - Complex Webs: Anticipating the Improbable

Information Technology Reference

In-Depth Information

with the axes being log(rank order) and log(frequency). For example, in Figure 1.12 the

word “the” as described in the above scatterplot would have the coordinates x = log(1),

y = log(69,971). The data conform to Zipf's law to the extent that the plotted points

appear to fall along a single straight-line segment.

Formally, let L be the number of elements in the data set, let r be the rank of a

data element when ordered from the most to the least frequent, and let

be the value

of the exponent characterizing the distribution. Zipf's law then predicts that, out of a

population of L elements, the frequency of elements of rank r , f

(

; η,

)

, is given by

r η

n = 1 (

(

; η,

) =

r η ) .

(2.54)

Thus, in the example of the frequency of words in the English language given in the

first chapter, L is the number of words in the English language and, if we use the classic

version of Zipf's law, the exponent

the fraction of the time the r th most common word occurs within a given language.

Moreover, it is easily seen from its definition that the distribution is normalized, that is,

the predicted frequencies sum to one:

is one. The distribution function f

(

; η,

)

(

; η,

) =

(2.55)

f function.” Given a set of Zipfian distributed

frequencies, sorted from most common to least common, the second most common fre-

quency will occur 1

The simplest case of Zipf's law is a “1

2 as often as the first. The third most common frequency will occur

n as often as the

first. However, this cannot hold exactly, because items must occur an integer number of

times: there cannot be 2

3 as often as the first. The n th most common frequency will occur 1

5 occurrences of a word. Nevertheless, over fairly wide ranges,

and to a fairly good approximation, many natural and social phenomena obey Zipf's law.

Mathematically, it is not possible for the classic version of Zipf's law to hold exactly

if there are infinitely many words in a language, since the sum of all relative frequencies

in the denominator ( 2.54 ) is equal to a harmonic series that diverges with diverging N :

∞

r =∞ .

(2.56)

In the English language, the frequencies of the approximately 1

000 most frequently

used words are empirically found to be approximately proportional to 1

r η , where

just slightly above one. As long as the exponent

exceeds 1, it is possible for such a

law to hold with infinitely many words, since if

η>

1 the sum no longer diverges,

∞

r η < ∞ ,

ζ(η) =

(2.57)

where

is the Riemann zeta function.

Just why data from complex webs, such as those of Auerbach, Lotka, Willis and

others shown in Chapter 1 , conform to the distribution of Zipf is a matter of some

Complex Webs: Anticipating the Improbable

Search WWH ::

Custom Search

Home