Cryptography Reference
In-Depth Information
I was more lucky with cryptograms . These are popular crypto-puzzles, espe-
cially in the English-speaking world, where you have to guess the plaintext
from a given ciphertext. However, the word boundaries are not hidden, i.e.,
blanks are not removed or encrypted. So most of the solution programs are
based on frequency analyses with subsequent dictionary search — useless for
our task.
Only one method, the one by Edwin Olson described on the Web site at
www.gtoal.com/wordgames/cryptograms.html , appeared to be more apt.
It uses letter repeat patterns of words taken from a large dictionary. For
example, 'BANANA' has the pattern 'ABCBCB', 'ENTER' and 'ESTER' the
pattern 'ABCAD', and so on. A code breaker would proceed in a similar way:
he would first look for striking patterns.
Olson didn't disclose the source text of his program, and he described the
algorithm only roughly (by the way, he also utilized the knowledge of word
boundaries). But that was sufficient for me to build on this idea and develop my
own program, which you find on the Web site under subscrack.zip , including
documentation in English and German. It was written in the Python script
language, so that it is relatively independent of the operating system you use.
Since it is a demonstration program, it is limited to the original task, i.e., the
substitution of 26 letters without punctuation marks and blanks. My algorithm
shares only the basic idea with Olson's and looks like this:
1. Build a list of letter repeat patterns from a large dictionary. Assign a list
of words corresponding to this pattern to each pattern.
2. Search the ciphertext for a long word pattern using a short pertaining
word list. Use the first word from the list to obtain substitutions for some
letters.
3. Once the first match is found, look for the next pattern in the ciphertext.
Check for each word from its list whether or not there are contradictions:
identical letters always have to decipher to the same letter, different
letters to different ones. Moreover, subscrack checks whether there are
new letters. If there aren't, it tests for the next word in the list. When
the list is exhausted, the program continues with the next pattern. Then
a third pattern is found once an attempt was successful, and so on. Once
all patterns are exhausted, the program takes the next word in its list
from the last pattern but one, and so on. This is called a tree search .
4. Once all or almost all characters have been deciphered, the entire cipher-
text is tentatively decrypted. Next, the program uses a large dictionary to
Search WWH ::




Custom Search