Java Reference
In-Depth Information
We re-number the states to produce the equivalent DFA shown in Figure 2.24.
FIGURE 2.24 A minimal DFA recognizing (ajb)baa.
2.9 JavaCC: Tool for Generating Scanners
JavaCC (the CC stands for compiler-compiler) is a tool for generating lexical analyzers
from regular expressions, and parsers from context-free grammars. In this section we are
interested in the former; we visit the latter in the next chapter.
A lexical grammar specification takes the form of a set of regular expressions and a set of
lexical states; from any particular state, only certain regular expressions may be matched in
scanning the input. There is a standard DEFAULT state, in which scanning generally begins.
One may specify additional states as required.
Scanning a token proceeds by considering all regular expressions in the current state
and choosing that which consumes the greatest number of input characters. After a match,
one can specify a state in which the scanner should go into; otherwise the scanner stays in
the current state.
There are four kinds of regular expressions, determining what happens when the regular
expression has been matched:
1. SKIP : throws away the matched string.
2. MORE : continues to the next state, taking the matched string along.
3. TOKEN : creates a token from the matched string and returns it to the parser (or any
caller).
4. SPECIAL_TOKEN : creates a special token that does not participate in the parsing.
For example, a SKIP can be used for ignoring white space:
SKIP:{""|"\t"|"\n"|"\r"|"\f"}
This matches one of the white space characters and throws it away; because we do not
specify a next state, the scanner remains in the current ( DEFAULT ) state.
We can deal with single-line comments with the following regular expressions:
MORE:{"//":IN_SINGLE_LINE_COMMENT}
<IN_SINGLE_LINE_COMMENT>
SPECIAL_TOKEN:{<SINGLE_LINE_COMMENT:"\n"|"\r"|"\r\n">:DEFAULT}
<IN_SINGLE_LINE_COMMENT>
MORE:{<~[]>}
 
Search WWH ::




Custom Search