Grammars and Parsing - Crafting a Compiler

Java Reference

In-Depth Information

•

A parser is considered top-down if it generates a parse tree by starting

at the root of the tree (the start symbol), expanding the tree by applying

productions in a depth-first manner. A top-down parse corresponds to

a preorder traversal of the parse tree. Top-down parsing techniques are

predictive in nature because they always predict the production that is to

be matched before matching actually begins. The top-down approach

includes the recursive-descent parser discussed in Chapter 2.

•

The bottom-up parsers generate a parse tree by starting at the tree's

leaves and working toward its root. A node is inserted in the tree only

after its children have been inserted. A bottom-up parse corresponds to

a postorder traversal of the parse tree.

The following grammar generates the skeletal block structure of a program-

ming language.

1 Program

→

begin Stmts end $

2 Stmts

→

Stmt

; Stmts

3

| λ

4 Stmt

→

simplestmt

5

|

begin Stmts end

Using this grammar, Figures 4.5 and 4.6 illustrate a top-down and bottom-

up parse of the string begin simplestmt ; simplestmt ; end $. Each box shows

one step of the parse, with the particular rule denoted by bold lines between

a parent (the rule's LHS) and its children (the rule's RHS). Solid, non-bold

lines indicate rules that have already been applied; dashed lines indicate rules

that have not yet been applied. For example, Figure 4.5(a) shows the rule

Program

begin Stmts end $ applied as the first step of a top-down parse.

Figure 4.6(f) shows the same rule applied as the last step of a bottom-up parse.

→

When specifying a parsing technique, we must state whether a leftmost

or rightmost parse will be produced. The best-known and most widely used

top-down and bottom-up parsing strategies are called LL and LR , respectively.

These names seem rather arcane, but they reflect how the input is processed

andwhich kind of parse is produced. In both cases, the first character (L) states

that the token sequence is processed from left to right. The second letter (L or

R) indicates whether a leftmost or rightmost parse is produced. The parsing

technique can be further characterized by the number of lookahead symbols

(i.e., symbols beyond the current token) that the parser may consult to make

parsing choices. LL(1) and LR(1) parsers are the most common, requiring only

one symbol of lookahead.

Crafting a Compiler

Search WWH ::

Custom Search

Home