Cryptography Reference
In-Depth Information
it is not surprising that the five letters 'ompre' and 'press' also occur
84 times.
The text is generated in a process guided by these statistics. The
text begins by selecting one group of five letters at random. In the
Figure, the first five letters are “The l”. Then it uses the statistics to
dictate which letters can follow. In the draft of Chapter 5, the five
letters 'he la' occur 2 times, the letters 'he le' occur 16 times and the
letters “he lo” occur 2 times. If the fifth-order text is going to mimic
the statistical profile of Chapter 5, then there should be a 2 out of
20 chance that the letter “a” should follow the random “The l”. Of
course, there should also be a 16 out of 20 chance that it should be a
“e” and a 2 out of 20 chance that it should be an “o”.
This process is repeated ad infinitum until enough text is gener-
ated. It is often amazing just how real the result sounds. To a large
extent, this is caused by the smaller size of the sample text. If you as-
sume that there are about 64 printable characters in a text file, then
there are about 64 5 different combinations of five letters. Obviously,
many of them like “zqTuV” never occur in the English language, but a
large number of themmust make their way into the table if the algo-
rithm is to have many choices. In the last example, there were three
possible choices for a letter to follow “The l”. The phrase “The let-
ter” is common in Chapter 5, but the phrase “The listerine” is not.
In many cases, there is only one possible choice that was dictated by
the small number of words used in the sample. This is what gives it
such a real sounding pattern.
Here's the algorithm for generating
n
th -order text called
T
given a
source text
S
:
n
S
1. Construct a list of all combinations of
letters that occur in
S
and keep track of howmany times each of these occurs in the
.
2. Choose one at random to be a seed. This will be the first
n
letters of
T
.
3. Repeat this loop until enough text is generated:
.
(b) Search through the statistical table and find all combina-
tions of letters that begin with these
(a) Take the last
n − l
letters of
T
1 letters.
(c) The last letters of these combinations is the set of possible
choices for the next letter to be added to
n −
.
(d) Choose among these letters and use the frequency of their
occurrences in
T
S
to weight your choice.
(e) Add it to
T
.
Search WWH ::




Custom Search