Java Reference
In-Depth Information
enough to describe the many programming languages in use today. We look at these classes
of grammars in the next sections.
Because of their recursive nature, parsing languages described by context-free grammars
requires the use of a pushdown stack. In general, we can parse any language that is described
by a context-free grammar but more often than not, the algorithms that do so involve
backtracking.
But there are classes of grammars whose languages may be parsed without backtracking;
the parsing is said to be deterministic. At each step of the parsing algorithm, the next step
may be determined simply by looking at both the state of the pushdown stack and the
incoming symbol (as the input sentence is scanned from left to right). These deterministic
parsers may be roughly divided into two categories:
Top-down parsing: where the parser begins at the start symbol and, step-by-step de-
rives the input sentence and producing either a parse tree or (more likely) an abstract
syntax tree (AST) from the root down to its leaves.
Bottom-up parsing: where the parser begins at the input sentence and, scanning it
from left to right, applies the production rules, replacing right-hand sides with left-
hand-sides, in a step-by-step fashion for reducing the sentence to obtain the start
symbol. Here the parse tree or abstract syntax tree is built from the bottom leaves
up to the top root.
3.3 Top-Down Deterministic Parsing
There are two popular top-down deterministic parsing strategies: parsing by recursive de-
scent and LL(1) parsing. Both parsers scan the input from left to right, looking at and
scanning just one symbol at a time.
In both cases, the parser starts with the grammar's start symbol as an initial goal in
the parsing process; that is, the goal is to scan the input sentence and parse the syntactic
entity represented by the start symbol. The start symbol is then rewritten, using a BNF
rule replacing the symbol with the right-hand side sequence of symbols.
For example, in parsing a j-- program, our initial goal is to parse the start symbol,
compilationUnit. The compilationUnit is defined by a single extended-BNF rule.
compilationUnit ::= [ package qualiedIdentier ; ]
f import qualiedIdentier ; g
ftypeDeclarationg EOF
(3.15)
So, the goal of parsing a compilationUnit in the input can be rewritten as a number of
sub-goals:
1. If there is a package statement in the input sentence, then we parse that.
2. If there are import statements in the input, then we parse them.
3. If there are any type declarations, then we parse them.
4. Finally, we parse the terminating EOF token.
 
Search WWH ::




Custom Search