Cryptography Reference
In-Depth Information
to recover the bits would need to know something about the gram-
mar that was used to produce the sentences. This would be kept
secret by both sides of the transmission. Figuring out the grammar
that generated a particular set of sentences is not easy. The ambigu-
ous grammar example on page 110 shows how five production rules
The Alicebot project lets
computers chatter in
natural languages.
Imagineiftheywere
encoding information
at the same time?
( www.alicebot.org )
can produce a number of sentences in two different ways. Because
there so many different possible grammars that could generate each
sentence, it would be practically impossible to search through all of
them.
Nor is it particularly feasible to reconstruct the grammar. Decid-
ing where the words produced from one variable end and the words
produced by another variable begin is a difficult task. You might be
able to create such an inference when you find the same sentence
type repeated again and again and again.
These reasons don't guarantee the security of the system by any
means. They just offer some intuition for why it might be hard to
recover the bits hidden with a complicated grammar. Section 7.3.4
“Scrambled Grammars”
on page 119 shows how
to rearrange grammars
for more security.
fon page 128 discusses some of the deeper reasons to believe in the
security of the system.
7.3 Creating Grammar-BasedMimicry
Producing software to do context-free mimicry is not complicated.
A C version of the code is
also available on the
code disk. It is pretty
much a straight
conversion.
You only need to have a basic understanding of how to parse text,
generate some random numbers, and break up data into individual
bits.
There are a number of different details of the code that bear ex-
plaining. The best place to begin is the format for the grammar files.
Figure 7.3 shows a scrap from the baseball context-free grammar il-
lustrated in Figure 7.1.
The variables begin with the asterisk character and must be one
contiguous word. A better editor and parser combination would be
able to distinguish between them and remove this restriction. Start-
ing with a bogus character like the asterisk is the best compromise.
Although it diminishes readability, it guarantees that there won't be
any ambiguity.
The list of productions that could emerge from each variable is
separated by forward slashes. The pattern is: phrase / number /. The
final phrase for a variable has an extra slash after the last number.
The number is a weighting given to the random choice maker. In this
example, most of the weights are .1. The software simply adds up all
of the weights for a particular variable and divides through by this
total to normalize the choices.
Search WWH ::




Custom Search