Introduction to Cryptography and Data Security - Understanding Cryptography

Cryptography Reference

In-Depth Information

Let's determine the key space of the substitution cipher: When choosing the re-

placement for the first letter A , we randomly choose one letter from the 26 letters of

the alphabet (in the example above we chose k ). The replacement for the next al-

phabet letter B was randomly chosen from the remaining 25 letters, etc. Thus there

exist the following number of different substitution tables:

2 88

key space of the substitution cipher = 26

·

25

···

3

·

2

·

1 = 26!

≈

Even with hundreds of thousands of high-end PCs such a search would take

several decades! Thus, we are tempted to conclude that the substitution cipher is

secure. But this is incorrect because there is another, more powerful attack.

Second Attack: Letter Frequency Analysis

First we note that the brute-force attack from above treats the cipher as a black box,

i.e., we do not analyze the internal structure of the cipher. The substitution cipher

can easily be broken by such an analytical attack.

The major weakness of the cipher is that each plaintext symbol always maps to

the same ciphertext symbol. That means that the statistical properties of the plaintext

are preserved in the ciphertext. If we go back to the second example we observe that

the letter q occurs most frequently in the text. From this we know that q must be the

substitution for one of the frequent letters in the English language.

For practical attacks, the following properties of language can be exploited:

1. Determine the frequency of every ciphertext letter. The frequency distribution,

often even of relatively short pieces of encrypted text, will be close to that of

the given language in general. In particular, the most frequent letters can often

easily be spotted in ciphertexts. For instance, in English E is the most frequent

letter (about 13%), T is the second most frequent letter (about 9%), A is the third

most frequent letter (about 8%), and so on. Table 1.1 lists the letter frequency

distribution of English.

2. The method above can be generalized by looking at pairs or triples, or quadru-

ples, and so on of ciphertext symbols. For instance, in English (and some other

European languages), the letter Q is almost always followed by a U . This behavior

can be exploited to detect the substitution of the letter Q and the letter U .

3. If we assume that word separators (blanks) have been found (which is only some-

times the case), one can often detect frequent short words such as THE , AND ,etc.

Once we have identified one of these words, we immediately know three letters

(or whatever the length of the word is) for the entire text.

In practice, the three techniques listed above are often combined to break substi-

tution ciphers.

Example 1.3. If we analyze the encrypted text from Example 1.2, we obtain:

WE WILL MEET IN THE MIDDLE OF THE LIBRARY AT NOON

ALL ARRANGEMENTS ARE MADE

Understanding Cryptography

Search WWH ::

Custom Search

Home