Parsing - Introduction to Compiler Construction in a Java World

Java Reference

In-Depth Information

enough to describe the many programming languages in use today. We look at these classes

of grammars in the next sections.

Because of their recursive nature, parsing languages described by context-free grammars

requires the use of a pushdown stack. In general, we can parse any language that is described

by a context-free grammar but more often than not, the algorithms that do so involve

backtracking.

But there are classes of grammars whose languages may be parsed without backtracking;

the parsing is said to be deterministic. At each step of the parsing algorithm, the next step

may be determined simply by looking at both the state of the pushdown stack and the

incoming symbol (as the input sentence is scanned from left to right). These deterministic

parsers may be roughly divided into two categories:

Top-down parsing: where the parser begins at the start symbol and, step-by-step de-

rives the input sentence and producing either a parse tree or (more likely) an abstract

syntax tree (AST) from the root down to its leaves.

Bottom-up parsing: where the parser begins at the input sentence and, scanning it

from left to right, applies the production rules, replacing right-hand sides with left-

hand-sides, in a step-by-step fashion for reducing the sentence to obtain the start

symbol. Here the parse tree or abstract syntax tree is built from the bottom leaves

up to the top root.

3.3 Top-Down Deterministic Parsing

There are two popular top-down deterministic parsing strategies: parsing by recursive de-

scent and LL(1) parsing. Both parsers scan the input from left to right, looking at and

scanning just one symbol at a time.

In both cases, the parser starts with the grammar's start symbol as an initial goal in

the parsing process; that is, the goal is to scan the input sentence and parse the syntactic

entity represented by the start symbol. The start symbol is then rewritten, using a BNF

rule replacing the symbol with the right-hand side sequence of symbols.

For example, in parsing a j-- program, our initial goal is to parse the start symbol,

compilationUnit. The compilationUnit is defined by a single extended-BNF rule.

compilationUnit ::= [ package qualiedIdentier ; ]

f import qualiedIdentier ; g

ftypeDeclarationg EOF

(3.15)

So, the goal of parsing a compilationUnit in the input can be rewritten as a number of

sub-goals:

1. If there is a package statement in the input sentence, then we parse that.

2. If there are import statements in the input, then we parse them.

3. If there are any type declarations, then we parse them.

4. Finally, we parse the terminating EOF token.

Search WWH ::

Custom Search

Home