Java Reference
In-Depth Information
To make this technique work for Java programs, we need to consider all
the contexts in which parentheses, braces, and brackets need not match. For
example, we should not consider a parenthesis as a symbol if it occurs inside a
comment, string constant, or character constant. We thus need routines to skip
comments, string constants, and character constants. A character constant in
Java can be difficult to recognize because of the many escape sequences pos-
sible, so we need to simplify things. We want to design a program that works
for the bulk of inputs likely to occur.
Symbols in com-
ments, string
constants, and
character con-
stants need not be
balanced.
For the program to be useful, we must not only report mismatches but also
attempt to identify where the mismatches occur. Consequently, we keep track of
the line numbers where the symbols are seen. When an error is encountered,
obtaining an accurate message is always difficult. If there is an extra } , does that
mean that the } is extraneous? Or was a { missing earlier? We keep the error han-
dling as simple as possible, but once one error has been reported, the program
may get confused and start flagging many errors. Thus only the first error can be
considered meaningful. Even so, the program developed here is very useful.
Line numbers are
needed for mean-
ingful error mes-
sages.
11.1.2 implementation
The program has two basic components. One part, called tokenization, is the
process of scanning an input stream for opening and closing symbols
(the tokens) and generating the sequence of tokens that need to be recognized.
The second part is running the balanced symbol algorithm, based on the
tokens. The two basic components are represented as separate classes.
Figure 11.2 shows the Tokenizer class skeleton, and Figure 11.3 shows the
Balance class skeleton. The Tokenizer class provides a constructor that
requires a Reader and then provides a set of accessors that can be used to get
Tokenization is the
process of generat-
ing the sequence of
symbols (tokens)
that need to be
recognized.
The next token (either an opening/closing symbol for the code in this
chapter or an identifier for the code in Chapter 12)
n
The current line number
n
The number of errors (mismatched quotes and comments)
n
The Tokenizer class maintains most of this information in private data mem-
bers. The Balance class also provides a similar constructor, but its only pub-
licly visible routine is checkBalance , shown at line 24. Everything else is a
supporting routine or a class data member.
We begin by describing the Tokenizer class. It is a reference to a
PushbackReader object and is initialized at construction. Because of the
I/O hierarchy (see Section 4.5.3), it may be constructed with any
Reader object. The current character being scanned is stored in ch , and
 
Search WWH ::




Custom Search