Simple Ciphers - Modern Cryptanalysis: Techniques for Advanced Code Breaking

Cryptography Reference

In-Depth Information

An adequate representation of the English language is The Complete Works of William Shakespeare [3]. We

can easily calculate the index of coincidence, ignoring punctuation and spaces, by counting the occurrences of

each character and applying the above formula. In this case, we calculate it to be approximately 0.0639.

While Shakespeare provides an interesting reference point and is fairly representative of English, it is neces-

sary to consider the source of the message you are analyzing. For example, if your source text likely is C code,

a better reference might be a large collection of C code, such as the Linux kernel. The Linux 2.6.15.1 kernel has

an I C ≈ 0.0585. Or, if the text is in Klingon, we can take a sample size of Klingon with a few English loan words

(taken from about 156 kilobytes of the Qo'noS Qonos ), and find the I C ≈ 0.0496.

The theoretically perfect I C is if all characters occurred the exact same number of times so that none was

more likely than any other to be repeated. This can be easily calculated. For English, since we have 26 char-

acters in our Latin-based alphabet, the perfect value would be that each character occurs exactly 1/26-th of the

time. This means that, in the above equation, we can assume that length = 26 × count( c ) for all c .

This gives us the following formula to calculate the perfect theoretical maximum. We can assume that the

count is n, to make the formula easier to read. To see what happens as we get more and more ciphertext, the

counts will be more precise; therefore, we will assume that the amount of ciphertext is approaching an infinite

amount.

We can simplify this a little (since we know that each part of the sum is always the same):

And we can even simplify a little further:

Most calculus courses teach L'Hôpital's Rule, which tells us that the above limit can be simplified again,

giving our theoretical best:

I C = 1/26 ≈ 0.03846

This can be seen intuitively by the fact that, as n gets very large, the subtraction of the constant 1 means very

little to the value of the fraction, which is dominated by the n /26 n part. This is simplified to 1/26.

Note that this technique does not allow us to actually break a cipher. This is simply a tool to provide us more

information about the text with which we are dealing.

1.5.1.3 Other Issues

There are some proposed methods of strengthening basic ciphers (monoalphabetic, polyalphabetic, transposi-

tion, or others). See Reference [5] for some of these examples.

One very simple method is to throw meaningless characters called nulls into the ciphertext. For example, the

character X does not appear very often in texts. Therefore, we could just throw the letter X randomly into the

plaintext before encrypting. This technique isn't terribly difficult to spot: Frequency analysis will show a fairly

normal distribution of characters, except for an extra, large spike in the distribution. Once any suspected nulls

are removed, the analysis should be easier. Another common null is to remove spaces from the plaintext and

add them to the ciphertext in a random, English-like manner.

Search WWH ::

Custom Search

Home