Java Reference
In-Depth Information
Even if a source listing is not requested, each token should contain the
line number in which it appeared. The token's position in the source line
may also be useful. If an error involving the token is noted, the line number
and position marker can be used to improve the quality of error messages by
specifying where in the source file the error occurred. It is straightforward
to open the source file and then list the source line containing the error, with
the error message immediately below it. Sometimes, an error may not be
detected until long after the line containing the error has been processed. An
exampleofthisisagoto to an undefined label. If such delayed errors are rare
(as they usually are), then a message citing a line number can be produced,
for example, “Undefined label in statement 101.” In languages that freely
allow forward references, delayed errors may be numerous. For example,
Java allows declarations of methods after they are called. In this case, a file
of error messages keyed with line numbers can be written and later merged
with the processed source lines to produce a complete source listing. Source
line numbers are also required for reporting post-scanning errors in multipass
compilers. For example, a type conversion error may arise during semantic
analysis; associating a line number with the error message greatly helps a
programmer understand and correct the error.
A common view is that compilers should just concentrate on translation
and code generation and leave the listing and prettyprinting (but not error
messages) to other tools. This considerably simplifies the scanner.
3.7.3 Terminating the Scanner
A scanner is designed to read input characters and partition them into tokens.
When the end of the input file is reached, it is convenient to create an end-of-file
pseudocharacter.
In Java, for example, InputStream.read(), which reads a single byte,
returns -1 when end-of-file is reached. A constant, Eof,definedas-1,can
be treated as an “extended” ASCII character. This character then allows the
definition of an EndFile token that can be passed back to the parser. The
EndFile token is useful in a CFG because it allows the parser to verify that the
logical end of a program corresponds to its physical end. In fact, LL(1) parsers
(discussed in Chapter 5) and LALR(1) parsers (discussed in Chapter 6) require
an EndFile token.
What will happen if a scanner is called after end-of-file is reached? Ob-
viously, a fatal error could be registered, but this would destroy our simple
model in which the scanner always returns a token. A better approach is to
continue to return the EndFile token to the parser. This allows the parser to
handle termination cleanly, especially since the EndFile token is normally syn-
tactically valid only after a complete program is parsed. If the EndFile token
 
 
Search WWH ::




Custom Search