Information Technology Reference
In-Depth Information
I. The probability that the
(
n
+
1
)
st word is a word that has already appeared exactly
k times is proportional to np k , n .
II. There is a constant probability
κ
that the
(
n
+
1
)
st word is a new word.
The stochastic process describing the probability that a particular word will be the
next one written depends on what words have already been written. The master equa-
tion for the new number
(
n
+
1
)
p k , n + 1 of words occurring exactly k times is given by
employing the two assumptions:
(
n
+
1
)
p k , n + 1 =
np k , n + θ [ (
k
1
)
p k 1 , n
kp k , n ] ,
(7.32)
where
st word is not a new word. The
form of this master equation is given by Newman using a somewhat longer argument
involving speciation and is a version of the equation given by Simon. As both Newman
and Simon point out, the only exception to ( 7.32 )isfor k
θ =
1
κ
is the probability that the
(
n
+
1
)
=
1, which obeys the equation
(
n
+
1
)
p 1 , n + 1 =
np 1 , n +
1
θ
p 1 , n .
(7.33)
Note that the number of words in this formulation corresponds to the discrete time index,
so that in this master equation, as observed by Newman, the analogue of the passage of
time is the incidence of new events, which in ( 7.32 ) is the writing of words.
We can now determine the asymptotic form of the solution to the master equations for
the number of words being generated in this mythical topic. The asymptotic or number-
independent form of the probability is given by taking the limit
p k =
lim
n
p k , n ,
(7.34)
→∞
so that taking this limit in ( 7.33 ) yields
1
p 1 =
1
θ
p 1
p 1 =
+ θ .
(7.35)
1
Applying the same limit to ( 7.32 ) yields
p k = θ [ (
k
1
)
p k 1
kp k ] ,
which can be rearranged to provide the iteration equation
k
1
p k =
p k 1 .
(7.36)
+
k
1
The solution to the iteration equation is provided in both Newman [ 49 ] and Simon [ 58 ]
by using ( 7.35 ) as the initial value to be
1
θ
(
k
)(
1
+
1
/θ)
1
θ
p k =
/θ) =
B
(
k
,
1
+
1
/θ).
(7.37)
(
k
+
1
+
1
Here the ratio of gamma functions is denoted by B
, the beta function, which
Simon called the Yule distribution, since this is the distribution obtained by Yule in his
mathematical argument for the distribution of new species [ 71 ]. As is well known in
classical analysis, the beta function has a power-law tail B
(
a
,
b
)
a b , from which
(
a
,
b
)
 
Search WWH ::




Custom Search