Information Technology Reference
In-Depth Information
In his exploration of models for natural language, Noam Chomsky introduced four lan-
guage types of decreasing expressibility, now called the
Chomsky hierarchy
, in which each
language is described by the type of grammar generating it. These languages serve as a basis for
the classification of programming languages. The four types are the phrase-structure languages,
the context-sensitive languages, the context-free languages, and the regular languages.
There is an exact correspondence between each of these types of languages and particular
machine architectures in the sense that for each language type
T
there is a machine architecture
A
recognizing languages of type
T
and for each architecture
A
there is a type
T
such that all
languages recognized by
A
are of type
T
. The correspondence between language and architec-
ture is shown in the following table, which also lists the section or problem where the result is
established. Here the
linear bounded automaton
is a Turing machine in which the number
of tape cells that are used is linear in the length of the input string.
Level
Language Type
Machine Type
Proof Location
0
phrase-structure
Turing machine
Section
5.4
1
context-sensitive
linear bounded automaton
Problem
4.36
2
context-free
nondet. pushdown automaton
Section
4.12
3
regular
finite-state machine
Section
4.10
We now give formal definitions of each of the grammar types under consideration.
4.9.1 Phrase-Structure Languages
In Section
5.4
we show that the phrase-structure grammars defined below are exactly the lan-
guages that can be recognized by Turing machines.
DEFINITION
4.9.1
A
phrase-structure grammar
G
is a four-tuple
G
=(
N
,
T
,
R
,
S
)
where
N
and
T
are disjoint alphabets of
non-terminals
and
terminals
, respectively. Let
V
=
N∪T
.
form a finite subset of
V
+
V
∗
(denoted
V
+
V
∗
)whereforeveryrule
The
rules
R
×
R⊆
×
(
a
,
b
)
∈R
,
a
contains at least one non-terminal symbol. The symbol
S
∈N
is the
start symbol
.
V
+
and
a
is a contiguous substring of
u
,then
u
can
be replaced by the string
v
by substituting
b
for
a
.Ifthisholds,wewrite
u
If
(
a
,
b
)
∈R
we write
a
→
b
.If
u
∈
⇒
G
v
and call it an
immediate derivation
. Extending this notation, if through a sequence of immediate derivations
(called a
derivation
)
u
,
x
n
⇒
G
v
we can transform
u
to
v
,we
write
u
⇒
G
v
and say that
v
derives
from
u
.Iftherules
⇒
G
x
1
,
x
1
⇒
G
x
2
,
···
+
,the
R
contain
(
a
,
a
)
for all
a
∈N
relation
∗
⇒
G
and
u
∗
V
∗
⇒
G
is called the
transitive closure
of the relation
⇒
G
u
for all
u
∈
containing at least one non-terminal symbol.
The
language
L
(
G
)
defined by the grammar
G
is the set of all terminal strings that can be
derived from the start symbol
S
;thatis,
S
∗
∈T
∗
|
L
(
G
)=
{
u
⇒
G
u
}
⇒
G
and
∗
⇒
G
. These definitions are
best understood from an example. In all our examples we use letters in
SMALL CAPS
to denote
non-terminals and letters in
italics
to denote terminals, except that
, the empty letter, may
also be a terminal.
When the context is clear we drop the subscript
G
in
Search WWH ::
Custom Search