Cryptography Reference
In-Depth Information
Let's determine the key space of the substitution cipher: When choosing the re-
placement for the first letter A , we randomly choose one letter from the 26 letters of
the alphabet (in the example above we chose k ). The replacement for the next al-
phabet letter B was randomly chosen from the remaining 25 letters, etc. Thus there
exist the following number of different substitution tables:
2 88
key space of the substitution cipher = 26
·
25
···
3
·
2
·
1 = 26!
Even with hundreds of thousands of high-end PCs such a search would take
several decades! Thus, we are tempted to conclude that the substitution cipher is
secure. But this is incorrect because there is another, more powerful attack.
Second Attack: Letter Frequency Analysis
First we note that the brute-force attack from above treats the cipher as a black box,
i.e., we do not analyze the internal structure of the cipher. The substitution cipher
can easily be broken by such an analytical attack.
The major weakness of the cipher is that each plaintext symbol always maps to
the same ciphertext symbol. That means that the statistical properties of the plaintext
are preserved in the ciphertext. If we go back to the second example we observe that
the letter q occurs most frequently in the text. From this we know that q must be the
substitution for one of the frequent letters in the English language.
For practical attacks, the following properties of language can be exploited:
1. Determine the frequency of every ciphertext letter. The frequency distribution,
often even of relatively short pieces of encrypted text, will be close to that of
the given language in general. In particular, the most frequent letters can often
easily be spotted in ciphertexts. For instance, in English E is the most frequent
letter (about 13%), T is the second most frequent letter (about 9%), A is the third
most frequent letter (about 8%), and so on. Table 1.1 lists the letter frequency
distribution of English.
2. The method above can be generalized by looking at pairs or triples, or quadru-
ples, and so on of ciphertext symbols. For instance, in English (and some other
European languages), the letter Q is almost always followed by a U . This behavior
can be exploited to detect the substitution of the letter Q and the letter U .
3. If we assume that word separators (blanks) have been found (which is only some-
times the case), one can often detect frequent short words such as THE , AND ,etc.
Once we have identified one of these words, we immediately know three letters
(or whatever the length of the word is) for the entire text.
In practice, the three techniques listed above are often combined to break substi-
tution ciphers.
Example 1.3. If we analyze the encrypted text from Example 1.2, we obtain:
WE WILL MEET IN THE MIDDLE OF THE LIBRARY AT NOON
ALL ARRANGEMENTS ARE MADE
Search WWH ::




Custom Search