Cryptography Reference
In-Depth Information
language. Suppose now that we have a language such that the relative frequencies
of the alphabet characters are
p
0
,
p
1
, ...,
p
r
−
1
. Then, starting with Eq.
1.1
above, we
have for a text of length
n
with frequencies
f
0
,
f
1
, ...,
f
r
−
1
:
r
−
1
i
r
−
1
i
r
−
1
i
=
f
i
f
i
f
i
2
1
n
0
(
n
)
−
−
n
−
n
=
0
=
0
I
c
(
x
)
=
=
=
1
1
n
n
(
n
−
1
)
n
2
(
1
−
n
)
1
−
f
i
n
Now, for
n
large, each of the relative frequencies
in the text is close to
p
i
and
we obtain the following approximation:
f
i
n
2
r
−
1
r
−
1
p
i
I
c
≈
≈
i
=
0
i
=
0
This value is the probability that two characters chosen at random in a text in
the given language are the same. We may apply a similar method to compute the
IC of the language consisting of random texts over a given alphabet. For such a
text (a text over the given alphabet, whose characters are randomly generated with
uniform probability distribution), the IC has the following value:
r
1
r
2
1
r
I
c
≈
=
We see that, as was to be expected, in this case the IC depends only on the size
of the alphabet. For example, it is approximately 0
.
0384 for the 26-letter English
alphabet and 0
.
037 for the 27-letter Spanish alphabet.
Exercise 1.11
Use the functions
Generate
and
AddFlavor
from Maple's
RandomTools
package to generate pseudo-random text strings of specified length
over a given alphabet. Compute the IC of these text strings using the function
Ic
(or
IndexOfCoincidence
) and check that they are close to the expected value for
a random string over the considered alphabet.
As we observed before, the IC increases when the frequency distribution of char-
acters is more uneven (that is, when the language in question has more redun-
dancy) and so the IC of a natural language with an alphabet of
r
symbols is
larger than
1
r
. We can use the above formula to compute the indices of coinci-
dence of English and Spanish from their frequency distributions. The frequency
distribution of the languages we are considering is given in Maple by the follow-
ing procedure. The list of frequencies in English is taken from [130] and that for
Spanish was compiled from a collection of texts of various kinds which included
around 850,000 characters.
> freqlist := proc(language)
if language = en then
[0.0817, 0.0149, 0.0278, 0.0425, 0.1270, 0.0223, 0.0202, 0.0609, 0.0697,
0.0015, 0.0077, 0.0403, 0.0241, 0.0675, 0.0751, 0.0193, 0.0010, 0.0599,