Parsing - Introduction to Compiler Construction in a Java World

Java Reference

In-Depth Information

Parsing a token, like package , is simple enough. The token should be the first token in

the as-yet unscanned input sentence. If it is, we simply scan it; otherwise we raise an error.

Parsing a non-terminal is treated as another parsing (sub-) goal. For example, in the

package statement, once we have scanned the package token, we are left with parsing a

qualiedIdentier. This is defined by yet another BNF rule

qualiedIdentier ::= <identifier> f .<identifier> g

(3.16)

Here we scan an <identifier> (treated by the lexical scanner as a token). And, long as we

see another period in the input, we scan that period and scan another <identifier> .

That we start at the start symbol and continually rewrite non-terminals using BNF rules

until we eventually reach leaves (the tokens are the leaves) makes this a top-down parsing

technique. Because we, at each step in parsing a non-terminal, replace a parsing goal with

a sequence of sub-goals, we often call this a goal-oriented parsing technique.

How do we decide what next step to take? For example, how do we decide whether or

not there are more import statements to parse? We decide by looking at the next unscanned

input token. If it is an import , we have another import statement to parse; otherwise we

go on to parsing type declarations. This kind of decision is more explicitly illustrated by

the definition for the statement non-terminal (3.17).

statement ::= block

j <identifier>: statement

j if parExpression statement [ else statement]

j while parExpression statement

j return [expression] ;

j ;

j statementExpression ;

(3.17)

When faced with the goal of parsing a statement, we have six alternatives to choose

from, depending on the next unscanned input token:

1. If the next token is a { , we parse a block;

2. If the next token is an if , we parse an if statement;

3. If the next token is a while , we parse a while statement;

4. If the next token is a return , we parse a return statement;

5. If the next token is a semicolon, we parse an empty statement;

6. Otherwise (or based on the next token being one of a set of tokens, any one of which

may begin an expression) we parse a statementExpression.

In this instance, the decision may be made looking at the next single unscanned input

token; when this is the case, we say that our grammar is LL(1). In some cases, one must look

ahead several tokens in the input to decide which alternative to take. In all cases, because

we can predict which of several alternative right-hand sides of a BNF rule to apply, based

on the next input token(s), we say this is a predictive parsing technique.

There are two principal top-down (goal-oriented, or predictive) parsing techniques avail-

able to us:

1. Parsing by recursive descent; and

2. LL(1) or LL(k) parsing.

Introduction to Compiler Construction in a Java World

Search WWH ::

Custom Search

Home