Cryptography Reference
In-Depth Information
language. Suppose now that we have a language such that the relative frequencies
of the alphabet characters are p 0 , p 1 , ..., p r 1 . Then, starting with Eq. 1.1 above, we
have for a text of length n with frequencies f 0 , f 1 , ..., f r 1 :
r 1
i
r 1
i
r 1
i =
f i
f i
f i
2
1
n
0 (
n )
n
n
=
0
=
0
I c (
x
) =
=
=
1
1
n
n
(
n
1
)
n 2
(
1
n )
1
f i
n
Now, for n large, each of the relative frequencies
in the text is close to p i and
we obtain the following approximation:
f i
n
2
r
1
r
1
p i
I c
i
=
0
i
=
0
This value is the probability that two characters chosen at random in a text in
the given language are the same. We may apply a similar method to compute the
IC of the language consisting of random texts over a given alphabet. For such a
text (a text over the given alphabet, whose characters are randomly generated with
uniform probability distribution), the IC has the following value:
r 1
r
2
1
r
I c
=
We see that, as was to be expected, in this case the IC depends only on the size
of the alphabet. For example, it is approximately 0
.
0384 for the 26-letter English
alphabet and 0
.
037 for the 27-letter Spanish alphabet.
Exercise 1.11 Use the functions Generate and AddFlavor from Maple's
RandomTools package to generate pseudo-random text strings of specified length
over a given alphabet. Compute the IC of these text strings using the function Ic (or
IndexOfCoincidence ) and check that they are close to the expected value for
a random string over the considered alphabet.
As we observed before, the IC increases when the frequency distribution of char-
acters is more uneven (that is, when the language in question has more redun-
dancy) and so the IC of a natural language with an alphabet of r symbols is
larger than
1
r . We can use the above formula to compute the indices of coinci-
dence of English and Spanish from their frequency distributions. The frequency
distribution of the languages we are considering is given in Maple by the follow-
ing procedure. The list of frequencies in English is taken from [130] and that for
Spanish was compiled from a collection of texts of various kinds which included
around 850,000 characters.
> freqlist := proc(language)
if language = en then
[0.0817, 0.0149, 0.0278, 0.0425, 0.1270, 0.0223, 0.0202, 0.0609, 0.0697,
0.0015, 0.0077, 0.0403, 0.0241, 0.0675, 0.0751, 0.0193, 0.0010, 0.0599,
 
Search WWH ::




Custom Search