Information Technology Reference
In-Depth Information
the blank tape letter
β
when the stack is not empty, it enters the
accept state
f
(Rule (g)). If
the PDA encounters an
a
on its input tape when in state
p
,an
a
has been received after a
b
and the input is rejected (Rule (i)). After the PDA enters either the accept or reject states, it
remains there (Rules (j) and (k)).
In Section
4.12
we show that the languages recognized by pushdown automata are exactly
the languages defined by the context-free languages described in the next section.
4.9 Formal Languages
Languages are introduced in Section
1.2.3
.A
language
is a set of strings over a finite set
Σ
,
with
2, called an
alphabet
.
Σ
∗
is the language of all strings over
Σ
including the
empty
string
, which has zero length. The empty string has the property that for an arbitrary string
w
,
w
=
w
=
w
.
Σ
+
is the set
Σ
∗
without the empty string.
In this section we introduce grammars for languages, rules for
rewriting strings
through
the substitution of substrings. A
grammar
consists of alphabets
|
Σ
|≥
T
N
of terminal and
non-terminal symbols, respectively, a designated non-terminal start symbol, plus a set of rules
R
and
for rewriting strings. Below we define four types of language in terms of their grammars:
the phrase-structure, context-sensitive, context-free, and regular grammars.
The role of grammars is best illustrated with an example for a small fragment of English.
Consider a grammar
G
whose non-terminals
N
contain a start symbol
S
denoting a generic
sentence and
NP
and
VP
denoting generic noun and verb phrases, respectively. In turn, assume
that
N
also contains non-terminals for adjectives and adverbs, namely
AJ
and
AV
.Thus,
N
=
{
S
,
NP
,
VP
,
AJ
,
AV
,
N
,
V
}
. We allow the grammar to have the following words as terminals:
T
.Here
bob
,
alice
,and
duck
are nouns,
big
is an adjective,
smiles
and
quacks
are verbs, and
loudly
is an adverb. In our fragment of
English a sentence consists of a noun phrase followed by a verb phrase, which we denote by the
rule
S
=
{
bob
,
alice
,
duck
,
big
,
smiles
,
quacks
,
loudly
}
of the grammar are shown below. They include
rules to map non-terminals to terminals, such as
N
→
NP VP
. This and the other rules
R
→
bob
→
→
bob
→
smiles
S
NP VP
N
V
→
→
alice
→
quacks
NP
N
N
V
→
→
duck
→
loudly
NP
AJ N
N
AV
→
→
big
VP
V
AJ
→
VP
VAV
With these rules the following strings (sentences) can be generated:
bob smiles
;
big duck
quacks loudly
;and
alice quacks
. The first two sentences are acceptable English sentences,
but the third is not if we interpret
alice
as a person. This example illustrates the need for rules
that limit the rewriting of non-terminals to an appropriate context of surrounding symbols.
Grammars for formal languages generalize these ideas. Grammars are used to interpret
programming languages. A language is translated and given meaning through a series of steps
the first of which is
lexical analysis
. In lexical analysis symbols such as
a
,
l
,
i
,
c
,
e
are grouped
into tokens such as
alice
, or some other string denoting
alice
. This task is typically done with
a finite-state machine. The second step in translation is
parsing
, a process in which a tokenized
string is associated with a series of
derivations
or applications of the rules of a grammar. For
example,
big duck quacks loudly
, can be produced by the following sequence of derivations:
S
→
NP VP
;
NP
→
AJ N
;
AJ
→
big
;
N
→
duck
;
VP
→
VAV
;
V
→
quacks
;
AV
→
loudly
.
Search WWH ::
Custom Search