Information Technology Reference
In-Depth Information
to be found in a variety of phenomena. This imbalance is ultimately interpretable as the
implicit unfairness found in complex webs.
Let us set the words of a text in order of frequency of appearance, and rank them
assigning the rank
r
=
1 to the most frequently used word, the rank
r
=
2totheword
(
)
with the highest frequency after the first, and so on. The function
W
r
denotes the
number of times the word of rank
r
appears. Zipf found
K
r
η
,
W
(
r
)
=
(2.58)
where
1 and
K
is simply determined from (
2.54
).
Let us imagine that the number of times a word appears,
W
, can be interpreted as the
wealth of that word. This makes it possible to define the probability
η
≈
, namely the
probability that a wealth larger than
W
exists. According to Pareto the distribution of
wealth (actually Pareto's data were on the distribution of income, but we shall not dwell
on the distinction between income and wealth here) is
(
W
)
A
W
k
.
(
W
)
=
(2.59)
The distribution density
ψ(
W
)
is given by the derivative of the probability with respect
to the web variable
W
,
d
dW
(
ψ(
W
)
=−
W
),
(2.60)
which yields
B
W
a
,
ψ(
W
)
=
(2.61)
with the normalization constant given by
B
=
kA
(2.62)
and the new power-law index related to the old by
a
=
k
+
1
.
(2.63)
Now let us take (
2.58
) into account in the distribution of wealth. Imagine that we
randomly select, with probability density
p
a word of rank
r
from the collection of
words with distribution number given by (
2.58
). A relation between the wealth variable
and a continuous version of the rank variable is established using the equality between
the probability of realizing wealth in the interval (
W
(
r
),
,
W
+
dW
) and having the rank in
the interval (
r
,
r
+
dr
),
ψ(
W
)
dW
=
p
(
r
)
dr
.
(2.64)
We are exploring the asymptotic condition
r
1, and this makes it possible for us to
move from the discrete to the continuous representation. The equality (
2.64
) generates
a relation between the two distribution densities in terms of the Jacobian between the
two variates,