Grammars and Mimicry - Disappearing Cryptography

Cryptography Reference

In-Depth Information

Dana Angluin and Michael Kharitonov [AK91] extended the work

of Kearns and Valient as well as the work of Moni Naor and M. Yung

[NY89, NY90]. Their work shows that there are no known algorithms

run in polynomial time that predict membership in a class defined

by finite unions or interesections of finite automata or context-free

grammars.

These bounds deal with learning to predict whether a sentence

is in a class defined by a grammar, not to discover its parse tree.

But the results can apply to the grammar-based system here if there

is someway that a parse tree-discovering algorithm can be used to

predict membership.

Imagine that such an algorithm exists. Here is how to apply it

to predicting membership in some language defined by grammar

G 1 ,knownas

L

(

G 1 ) .Thestartsymbolfor

G 1

is

S 1 . Now suppose

there is another grammar

G 2 with start symbol

S 2 .Createanew

grammar,

G

, that is the union of

G 1 and

G 2 by creating a new start

symbol,

S

,andtheproduction

S → S 1 S 2 .Takeasetofstrings

a i ∈ L

G 2 ) . Apply the algorithm

that can learn to predict parse trees and feed it this set of strings. If

such an algorithm can learn to predict the parse tree grammar, then

it can predict whether a string is in

(

G

) . They are either in

L

(

G 1 ) or

L

(

G 1 ) .Ifsuchanalgorithmruns

in polynomial time, then it can be used to break RSA, factor Blum

integers and solve other problems. Therefore, there is no known

algorithm to predict even the first branch of a parse tree.

This result applies to the hardest grammars that might exist but it

does not offer any clues on how to actually produce such a grammar.

An algorithm that could construct such a grammar and guarantee

that it was hard to discover would be quite a find. There are some

minor observations, however, that can be satisfying.

You can easily imagine a grammar that would be easy to break. If

each word or substring was visible in one production, then it would

be relatively easy to isolate the string of productions that produced a

long section of text. The boundaries of the productions are simple to

establish by accumulating enough sample text so that each produc-

tion is used twice. The two occurrences can be compared to reveal

the different parts of the production.

This leads to the observation that each word should appear in

multiple productions. The section beginning on page 119 describes

how contractions and expansions can be applied automatically to

change grammars so they fit this requirement.

Howmuch contraction and expansion is enough? [Way95a] gives

one set of equations that can be used to measure the “randomness”

or “entropy” of a grammar. The equations aremodelled on Shannon's

L

(

Disappearing Cryptography

Search WWH ::

Custom Search

Home