Java Reference
In-Depth Information
Bu
ff
ered Token
Token Flag
1
Integer literal.
12
Integer literal.
12.
Floating-point literal.
12.3
Floating-point literal.
12.3e
Invalid (but valid prefix).
12.3e+
Invalid (but valid prefix).
Figure 3.15: Building the token buffer and setting token flags when
scanning with a backup.
be a real performance bottleneck in production compilers. Hence, it is a good
idea to consider how to increase scanning speed.
One approach to increasing scanner speed is to use a scanner generator
such as Flex orGLAthat is designed to generate fast scanners. These generators
will incorporate many “tricks” that increase speed in clever ways.
If you hand-code a scanner, a few general principles can increase scanner
performance dramatically. Try to block character-level operations whenever
possible. It is usually better to do one operation on n characters rather than n
operations on single characters. This is most apparent in reading characters.
In the examples herein, characters are input one at a time, perhaps using
Java's InputStream.read(oraCorC
++
equivalent). Using single-character
processing can be quite ine
cient. A subroutine call can cost hundreds or
thousands of instructions to execute—far too many for a single character.
Routines such as InputStream.read(buffer) perform block reads, putting
an entire block of characters directly into buffer. Usually, the number of
characters read is set to the size of a disk block (512 or perhaps 1024 bytes)
so that an entire disk block can be read in one operation. If fewer than the
requested number of characters are returned, then we know we have reached
end-of-file. An end-of-file (EOF) character can be set to indicate this.
One problem with reading blocks of characters is that the end of a block
won't usually correspond to the end of a token. For example, the beginning
of a quoted string may be found near the end of a block, but not the string's
end. Another read operation to get the rest of the string may overwrite the
first part.
Double-bu ff ering can avoid this problem, as shown in Figure 3.16. Input is
first read into the left bu
er
is overwritten. Unless a token whose text we want to save is longer than the
bu
ff
er, then into the right bu
ff
er, and then the left bu
ff
ff
er length, tokens can cross a bu
ff
er boundary without di
culty.
If the
bu
er size is made large enough (say 512 or 1,024 characters), then the chance
of losing part of a token is very low. If a token's length is near the bu
ff
ff
er's
 
Search WWH ::




Custom Search