From Sixteenth-Century Cryptography to the New Millennium — The Last 500 Years - Codes: The Guide to Secrecy from Ancient to Modern Times

Cryptography Reference

In-Depth Information

the probability that two letters selected at random from

are identical. Be-

low we show how to mathematically demonstrate that the index of coincidence

for a monoalphabetic cipher is about 0 . 065, and the index of coincidence for a

polyalphabetic cipher is somewhere between 0 . 0385 and 0 . 065. For very long

keywords, the index of coincidence for polyalphabetic ciphers will be closer to

0 . 0385. Hence, by a simple analysis of intercepted ciphertext, a cryptanalyst can

relatively easily determine the type of cryptosystem being used. This was quite

a breakthrough. Moreover, his idea contained a mechanism for determining the

probable keylength, as had Kasiski. Here is how it works.

First we need a table of letter frequencies

for the English alphabet. This well-known,

standard table (presented here as Table 2.5)

augments Tables 1.4 and 1.5, which we pre-

sented on pages 44 and 45, when we dis-

cussed letter frequencies in Section 1.4.

Now suppose that n stands for the num-

ber of letters in a ciphertext,

, and n j

stands for the number of letters in the j -th

position of the English alphabet. In other

words, n 1 is the number of occurrences of

the letter a in

, n 2 is the number of occur-

rences of the letter b in

, and so on. With-

out getting into the reasons for it, the Index

of Coincidence,

, is given as approximately

the following.

n 1

n 2

+ n 2

n 2

+ n 26

n 2

Figure 2.14: Elizabeth S. Fried-

man.

IC ≈

···

for the En-

glish language from Table 2.5, and since each of the numbers in the table is a

percentage, then we divide each by 100, and get:

So if we want to compute

(0 . 8167) 2 +(0 . 01492) 2 +

IC ≈

+(0 . 00074) 2 =0 . 065, which explains the aforementioned Index of Coinci-

dence for monoalphabetic ciphers, since the frequency is invariant. (Note that

the symbol

···

means “approximately equal to”. It is not a strict equality but

this is good enough since we are dealing with a statistical analysis wherein

approximations are good enough for our investigations.)

≈

Relative Letter Frequencies for English

Table 2.5

a b c d e f g h i

8 . 167 1 . 492 2 . 782 4 . 253 12 . 702 2 . 228 2 . 015 6 . 094 6 . 966

0 . 153 0 . 772 4 . 025 2 . 406

6 . 749

7 . 507 1 . 929 0 . 095 5 . 987

6 . 327 9 . 056 2 . 758 0 . 978

2 . 360

0 . 150 1 . 974 0 . 074

Codes: The Guide to Secrecy from Ancient to Modern Times

Search WWH ::

Custom Search

Home