Java Reference
In-Depth Information
Modern programming languages often contain a grammar in their specifi-
cation as a guide to those who teach, study, or use the language. Based on the
analysis discussed in this chapter, such grammars can also participate in the
automatic construction of syntax-checking parsers. Programming language
rules that are not easily expressed by grammars are enforced during the se-
mantic analyses discussed in Chapters 7, 8, and 9. In Chapter 2, we discuss
the rudiments of context-free grammars (CFGs) and define a simple language
using a CFGs. Here, we formalize the definition and notation for CFGss and
present algorithms that analyze such grammars in preparation for the parsing
techniques covered in Chapters 5 and 6.
4.1 Context-Free Grammars
Formally, a language is a set of finite-length strings over a finite alphabet.
Because most interesting languages are infinite sets, we cannot define such
languages by enumerating their elements. A context-free grammar (CFG) is
a compact, finite representation of a language, defined by the following four
components:
Afinite terminal alphabet
. This is the set of tokens produced by the
scanner. We always augment this set with the token $, which signifies
end-of-input.
Σ
Afinite nonterminal alphabet N . Symbols in this alphabet are variables
of the grammar.
A start symbol S
N that initiates all derivations . S is also called the
goal symbol .
A finite set of productions P (sometimes called rewriting rules )ofthe
form A
→X
1
...X m ,whereA
N ,
X i N ∪Σ
,1
i m ,and m
0. The
only valid production with m =
0isoftheformA
→λ
,where
λ
denotes
the empty string .
These components are often expressed as G
S ), which is the formal
definition of a CFG. The terminal and nonterminal alphabets must be disjoint
(i.e.,
=
( N ,Σ, P ,
). The vocabulary V of a CFG is the set of terminal and
nonterminal symbols (i.e., V =Σ∪ N ).
A CFG is essentially a recipe for creating strings. Starting with S,non-
terminals are rewritten using the grammar's productions until only terminals
remain. A rewrite using the production A
Σ∩ N = ∅
→α
replaces the nonterminal A with
the vocabulary symbols in
α
. As a special case, a rewrite using the production
A
→λ
causes A to be erased. Each rewrite is a step in a derivation of the
 
 
Search WWH ::




Custom Search