Java Reference
In-Depth Information
A parser is considered top-down if it generates a parse tree by starting
at the root of the tree (the start symbol), expanding the tree by applying
productions in a depth-first manner. A top-down parse corresponds to
a preorder traversal of the parse tree. Top-down parsing techniques are
predictive in nature because they always predict the production that is to
be matched before matching actually begins. The top-down approach
includes the recursive-descent parser discussed in Chapter 2.
The bottom-up parsers generate a parse tree by starting at the tree's
leaves and working toward its root. A node is inserted in the tree only
after its children have been inserted. A bottom-up parse corresponds to
a postorder traversal of the parse tree.
The following grammar generates the skeletal block structure of a program-
ming language.
1 Program
begin Stmts end $
2 Stmts
Stmt
; Stmts
3
| λ
4 Stmt
simplestmt
5
|
begin Stmts end
Using this grammar, Figures 4.5 and 4.6 illustrate a top-down and bottom-
up parse of the string begin simplestmt ; simplestmt ; end $. Each box shows
one step of the parse, with the particular rule denoted by bold lines between
a parent (the rule's LHS) and its children (the rule's RHS). Solid, non-bold
lines indicate rules that have already been applied; dashed lines indicate rules
that have not yet been applied. For example, Figure 4.5(a) shows the rule
Program
begin Stmts end $ applied as the first step of a top-down parse.
Figure 4.6(f) shows the same rule applied as the last step of a bottom-up parse.
When specifying a parsing technique, we must state whether a leftmost
or rightmost parse will be produced. The best-known and most widely used
top-down and bottom-up parsing strategies are called LL and LR , respectively.
These names seem rather arcane, but they reflect how the input is processed
andwhich kind of parse is produced. In both cases, the first character (L) states
that the token sequence is processed from left to right. The second letter (L or
R) indicates whether a leftmost or rightmost parse is produced. The parsing
technique can be further characterized by the number of lookahead symbols
(i.e., symbols beyond the current token) that the parser may consult to make
parsing choices. LL(1) and LR(1) parsers are the most common, requiring only
one symbol of lookahead.
 
Search WWH ::




Custom Search