Parsing - Introduction to Compiler Construction in a Java World

Java Reference

In-Depth Information

E ) E + T

) E + T * F

) E + T *id

) E + F *id

) E +id*id

) T +id*id

) F +id*id

) id+id*id

A careful reading of the preceding steps suggests that explicitly pushing the symbols

onto the stack is unnecessary because the symbols are implied by the states themselves.

An industrial-strength LR(1) parser will simply maintain a stack of states. We include the

symbols only for illustrative purposes.

For all of this to work, we must go about constructing the tables Action and Goto. To

do this, we must rst construct the grammar's LR(1) canonical collection.

The LR(1) Canonical Collection

The LR(1) parsing tables, Action and Goto, for a grammar G are derived from a DFA for

recognizing the possible handles for a parse in G. This DFA is constructed from what is

called an LR(1) canonical collection, in turn a collection of sets of items of the form

[Y ::= , a ]

(3.32)

where Y ::= is a production rule in the set of productions P, and are (possibly

empty) strings of symbols, and a is a lookahead. The item represents a potential handle.

The is a position marker that marks the top of the stack, indicating that we have parsed

the and still have the ahead of us in satisfying the Y . The lookahead symbol, a , is a

token that can follow Y (and so, ) in a legal right-most derivation of some sentence.

If the position marker comes at the start of the right-hand side in an item,

[Y ::= , a ]

the item is called a possibility. One way of parsing the Y is to rst parse the and

then parse the , after which point the next incoming token will be an a . The parse

might be in the following configuration:

Stack

Input

#

u a :::

where ) u, where u is a string of terminals.

If the position marker comes after a string of symbols but before a string of symbols

in the right-hand side in an item,

[Y ::= , a ]

Search WWH ::

Custom Search

Home