Cryptography Reference
In-Depth Information
Dana Angluin and Michael Kharitonov [AK91] extended the work
of Kearns and Valient as well as the work of Moni Naor and M. Yung
[NY89, NY90]. Their work shows that there are no known algorithms
run in polynomial time that predict membership in a class defined
by finite unions or interesections of finite automata or context-free
grammars.
These bounds deal with learning to predict whether a sentence
is in a class defined by a grammar, not to discover its parse tree.
But the results can apply to the grammar-based system here if there
is someway that a parse tree-discovering algorithm can be used to
predict membership.
Imagine that such an algorithm exists. Here is how to apply it
to predicting membership in some language defined by grammar
G 1 ,knownas
L
(
G 1 ) .Thestartsymbolfor
G 1
is
S 1 . Now suppose
there is another grammar
G 2 with start symbol
S 2 .Createanew
grammar,
G
, that is the union of
G 1 and
G 2 by creating a new start
symbol,
S
,andtheproduction
S → S 1 S 2 .Takeasetofstrings
a i ∈ L
G 2 ) . Apply the algorithm
that can learn to predict parse trees and feed it this set of strings. If
such an algorithm can learn to predict the parse tree grammar, then
it can predict whether a string is in
(
G
) . They are either in
L
(
G 1 ) or
L
(
G 1 ) .Ifsuchanalgorithmruns
in polynomial time, then it can be used to break RSA, factor Blum
integers and solve other problems. Therefore, there is no known
algorithm to predict even the first branch of a parse tree.
This result applies to the hardest grammars that might exist but it
does not offer any clues on how to actually produce such a grammar.
An algorithm that could construct such a grammar and guarantee
that it was hard to discover would be quite a find. There are some
minor observations, however, that can be satisfying.
You can easily imagine a grammar that would be easy to break. If
each word or substring was visible in one production, then it would
be relatively easy to isolate the string of productions that produced a
long section of text. The boundaries of the productions are simple to
establish by accumulating enough sample text so that each produc-
tion is used twice. The two occurrences can be compared to reveal
the different parts of the production.
This leads to the observation that each word should appear in
multiple productions. The section beginning on page 119 describes
how contractions and expansions can be applied automatically to
change grammars so they fit this requirement.
Howmuch contraction and expansion is enough? [Way95a] gives
one set of equations that can be used to measure the “randomness”
or “entropy” of a grammar. The equations aremodelled on Shannon's
L
(
Search WWH ::




Custom Search