Information Technology Reference
In-Depth Information
In his exploration of models for natural language, Noam Chomsky introduced four lan-
guage types of decreasing expressibility, now called the Chomsky hierarchy , in which each
language is described by the type of grammar generating it. These languages serve as a basis for
the classification of programming languages. The four types are the phrase-structure languages,
the context-sensitive languages, the context-free languages, and the regular languages.
There is an exact correspondence between each of these types of languages and particular
machine architectures in the sense that for each language type T there is a machine architecture
A recognizing languages of type T and for each architecture A there is a type T such that all
languages recognized by A are of type T . The correspondence between language and architec-
ture is shown in the following table, which also lists the section or problem where the result is
established. Here the linear bounded automaton is a Turing machine in which the number
of tape cells that are used is linear in the length of the input string.
Level
Language Type
Machine Type
Proof Location
0
phrase-structure
Turing machine
Section 5.4
1
context-sensitive
linear bounded automaton
Problem 4.36
2
context-free
nondet. pushdown automaton
Section 4.12
3
regular
finite-state machine
Section 4.10
We now give formal definitions of each of the grammar types under consideration.
4.9.1 Phrase-Structure Languages
In Section 5.4 we show that the phrase-structure grammars defined below are exactly the lan-
guages that can be recognized by Turing machines.
DEFINITION 4.9.1 A phrase-structure grammar G is a four-tuple G =( N , T , R , S ) where
N
and
T
are disjoint alphabets of non-terminals and terminals , respectively. Let V =
N∪T
.
form a finite subset of V +
V (denoted
V +
V )whereforeveryrule
The rules
R
×
R⊆
×
( a , b )
∈R
, a contains at least one non-terminal symbol. The symbol S
∈N
is the start symbol .
V + and a is a contiguous substring of u ,then u can
be replaced by the string v by substituting b for a .Ifthisholds,wewrite u
If ( a , b )
∈R
we write a
b .If u
G v and call it an
immediate derivation . Extending this notation, if through a sequence of immediate derivations
(called a derivation ) u
, x n G v we can transform u to v ,we
write u G v and say that v derives from u .Iftherules
G x 1 , x 1
G x 2 ,
···
+ ,the
R
contain ( a , a ) for all a
∈N
relation
G and u
V
G is called the transitive closure of the relation
G u for all u
containing at least one non-terminal symbol.
The language L ( G ) defined by the grammar G is the set of all terminal strings that can be
derived from the start symbol S ;thatis,
S
∈T |
L ( G )=
{
u
G u
}
G and
G . These definitions are
best understood from an example. In all our examples we use letters in SMALL CAPS to denote
non-terminals and letters in italics to denote terminals, except that , the empty letter, may
also be a terminal.
When the context is clear we drop the subscript G in
Search WWH ::




Custom Search