Cryptography Reference
In-Depth Information
Figure 1-2 Frequency distribution table for “vanilla” Linux 2.6.15.1 source code (including only alphabetic
characters). The total size is approximately 205 megabytes.
1.5.1.2 Index of Coincidence
One of the first questions we might ask is if a particular message is encrypted at all. And, if it is encrypted,
how is it encrypted? Based on our discussion above about the different kinds of cryptography, we would want
to know whether the message was encrypted with a mono- or polyalphabetic cipher so that we can begin to find
out the key.
We can begin with the index ofcoincidence (the I C ), a very useful tool that gives us some information about
the suspect ciphertext. It measures how often characters could theoretically appear next to each other, based on
the frequency analysis of the text. You can think about it as a measure of how evenly distributed the character
frequencies are within the frequency distribution table — the lower the number, the more evenly distributed.
For example, in unencrypted English, we know that letters such as E and S appear more often than X and Z. If a
monoalphabetic cipher is used to encrypt the plaintext, then the individual letter frequencies will be preserved,
although mapped to a different letter. Luckily, the I C is calculated so that the actual character does not matter,
and instead is based on the ratio of the number of times the character appears to the total number of characters.
The index of coincidence is calculated by the following:
This means that we take each character in the alphabet, take the number of them that appear in the text, multiply
by that same number minus one, and divide by the ciphertext length times the ciphertext length minus one.
When we add all of these values together, we will have calculated the probability that two characters in the
ciphertext could, theoretically, be repeated in succession.
How do polyalphabetic ciphers factor into this? In this case, the same letter will not be encrypted with the
same alphabet, meaning that many ofthe letter appearances will be distributed to other letters in a rather random
fashion, which starts to flatten out the frequency distribution. As the frequency distribution becomes flatter, the
I C becomes smaller, since the amount of information about the frequencies is decreasing.
 
Search WWH ::




Custom Search