Java Reference
In-Depth Information
most of Nicklaus Wirth's languages, for example, Algol-W, Pascal, and Modula. Recursive
descent was used to produce the parser for the first implementations of C but then using
YACC; that and the fact that YACC was distributed with Unix popularized it. YACC was
the first LALR(1) parser generator with a reasonable execution time.
Interestingly, LL(1) and recursive descent parsers are enjoying greater popularity, for
example for the parsing of Java. Perhaps it is the simplicity of the predictive top-down
approach. It is now possible to come up with (mostly LL) predictive grammars for most
programming languages. True, none of these grammars are strictly LL(1); indeed, they
are not even unambiguous, look at the if-else statement in Java and almost every other
programming language. But these special cases may be handled specially, for example by
selective looking ahead k symbols in rules where it is necessary, and favoring the scanning
of an else when it is part of an if-statement. There are parser generators that allow the
parser developer to assert these special conditions in the grammatical specifications. One
of these is JavaCC, which we discuss in the next section.
3.5 Parser Generation Using JavaCC
In Chapter 2 we saw how JavaCC can be used to generate a lexical analyzer for j-- from an
input file ( j--.jj ) specifying the lexical structure of the language as regular expressions.
In this section, we will see how JavaCC can be used to generate an LL(k) recursive descent
parser for j-- from a file specifying its syntactic structure as EBNF (extended BNF) rules.
In addition to containing the regular expressions for the lexical structure for j--, the
j--.jj file also contains the syntactic rules for the language. The Java code between the
PARSER_BEGIN(JavaCCParser) and PARSER_END(JavaCCParser) block in the j--.jj le is
copied verbatim to the generated JavaCCParser.java file in the jminusminus package. This
code defines helper functions, which are available for use within the generated parser. Some
of the helpers include reportParserError() for reporting errors and recoverFromError()
for recovering from errors. Following this block is the specification for the scanner for j--,
and following that is the specification for the parser for j--.
We now describe the JavaCC syntactic specification. The general layout is this: we define
a start symbol, which is a high-level non-terminal ( compilationUnit in case of j--) that
references lower-level non-terminals. These lower-level non-terminals in turn reference the
tokens defined in the lexical specification.
When building a syntactic specification, we are not limited to literals and simple token
references. We can use the following EBNF syntax:
[a] for \zero or one", or an \optional" occurrence of a
(a) for \zero or more" occurrences of a
ajb for alternation, that is, either a or b
() for grouping
The syntax for a non-terminal declaration (or, production rule) in the input file almost
resembles that of a java method declaration; it has a return type (could be void ), a name,
can accept arguments, and has a body that specifies the extended BNF rules along with
any actions that we want performed as the production rule is parsed. It also has a block
preceding the body; this block declares any local variables used within the body. Syntactic
 
Search WWH ::




Custom Search