Java Reference
In-Depth Information
For example, if we allow Fortran-like fixed-decimal literals such as1.and.10
and the Pascal subrange operator "..",then1..10 will most likely be miss-
canned as two fixed-decimal literals rather than two integer literals separated
by the subrange operator. Lex allows us to define a regular expression that
applies only if some other expression immediately follows it. For example,
r/s tells Lex to match regular expression r, but only if regular expression s
immediately follows it. The expression s is right-context . That is, it is not
part of the token that is matched, but it must be present for r to be matched.
Thus [0-9]+/".."would match an integer literal, but only if .. immediately
follows it. Since this pattern covers more characters than the one defining a
fixed-decimal literal, it takes precedence. The longest match is still chosen,
but the right-context characters are returned to the input so that they can be
matched as part of a later token.
The operators and special symbols most commonly used in Lex are sum-
marized in Figure 3.13. Note that a symbol sometimes has one meaning in a
regular expression and an entirely di
erent meaning in a character class (that
is, within a pair of brackets). If you find Lex behaving unexpectedly, it is a good
idea to check this table to be sure how the operators and symbols you have
used behave. Ordinary letters and digits, as well as symbols not mentioned
(such as @), represent themselves. If you are not sure whether a character is
special, you can always escape it or make it part of a quoted string.
In summary, Lex is a very flexible generator that can produce a complete
scanner from a succinct definition. The di
ff
cult part of working with Lex is
learning its notation and rules. Once you have done this, Lex will relieve
you of the many chores of writing a scanner (for example, reading characters,
bu
ering them, and deciding which token pattern matches). Moreover, Lex's
notation for representing regular expressions is used in other Unix programs,
most notably the grep pattern matching utility.
Lex can also transform input as a preprocessor, as well as scan it. It
provides a number of advanced features beyond those discussed here. It does
require that code segments be written in C, and hence it is not language-
independent.
ff
3.6 Other Scanner Generators
Lex is certainly themost widely known andwidely available scanner generator
because it is distributed as part of the Unix system. Even after years of use,
it still has bugs, however, and produces scanners too slow to be used in
production compilers. This section briefly discussed some of the alternatives
to Lex, including Flex, JLex, Alex, Lexgen, GLA,andre2c.
It has been shown that Lex can be improved so that it is always faster than
a handwritten scanner [Jac87]. This is done using Flex , a widely used, freely
 
 
Search WWH ::




Custom Search