Cryptography Reference
In-Depth Information
but even when it is transliterated the distribution is different. Each
language and even each regional dialect has a different composition.
The texts generated here could fool such an automatic scanning
device because the output is statistically equivalent to honest English
text. For instance, the letter “e” is the most common and the letter
“t” is next most common. Everything looks statistically correct at
all of the different orders. If the scanning software was looking for
statistical deviance, it wouldn't find it.
An automatic scanning program is also at a statistical disadvan-
tage with relatively short text samples. Its statistical definition of
what is normal must be loose enough to fit changes caused by the
focus of the text. A document about zebras, for instance, would have
many more “z”s than the average document, but this alone doesn't
make it abnormal. Many documents might have a higher than av-
erage occurrence of “j”s or “q”s merely because the topic involves
something like jails or quiz shows.
Of course, these texts wouldn't be able to fool a person. At least
the first-, second-, or third-order texts wouldn't fool someone. But
a fifth-order text based on a sample from an obscure and difficult
jargon like legal writing might fool many people who aren't familiar
with the structures of the genre.
More complicated statistical models can produce better mimicry,
at least in the right cases. Markov models, for instance, are common
in speech recognition and genetic algorithms can do a good job pre-
dicting some patterns. In general, any of the algorithms designed to
help a computer learn to recognize a pattern can be applied here to
suss out a pattern before being turned in reverse to imitate it.
More complicated grammatical analysis is certainly possible. There
are grammar checkers that scan documents and identify bad sen-
tence structure. These products are far from perfect. Many people
write idiomatically and others stretch the bounds of what is consid-
ered correct grammar without breaking any of the rules. Although
honest text generated by humans may set off many flags, even the
fifth-order text shown in this chapter would appear so wrong that it
could be automatically detected. Any text that had, say, more wrong
Chapter 7 offers an
approach to defeating
grammar checkers.
than right with it could be flagged as suspicious by an automatic pro-
cess. [KO84, Way85].
6.2.1 Choosing the Next Letter
The last section showed how statistically equivalent text could be
generated by mimicking the statistical distribution of a source col-
lection of text. The algorithm showed how to choose the next letter
Search WWH ::




Custom Search