Java Reference
In-Depth Information
Scanner
Scanner
Module
(in C)
Lex
Specification
Figure 3.6: The operation of the Lex scanner generator.
programmed. Rather, we can focus on the character structure of tokens and
how they are to be processed.
The primary purpose of this section is to show how regular expressions
and related information are presented to scanner generators. A helpful way
to learn Lex is to start with the simple examples presented here and then
gradually generalize them to solve the problem at hand. To inexperienced
readers, Lex's rules may seem unnecessarily complex. It is best to keep in
mind that the key is always the specification of tokens as regular expressions.
The rest is there simply to increase e
ciency and handle various details.
3.5.1 Defining Tokens in Lex
Lex's approach to scanning is simple. It allows the user to associate regular
expressions with commands coded in C (or C
). When input characters that
match the regular expression are read, the associated commands are executed.
Users of Lex do not specify how to match tokens, except by providing the
regular expressions. The associated commands specify what should be done
when a particular token is matched.
Lex creates a filelex.yy.cthat contains an integer functionyylex().This
function is normally called from the parser whenever another token is needed.
The value that yylex() returns is the token code of the token scanned by
Lex. Tokens such as whitespace are deleted simply by having their associated
command not return anything. Scanning continues until a command with a
return in it is executed.
Figure 3.7 illustrates a simple Lex definition for the three reservedwords—
f, i,andp—of the ac language introduced in Chapter 2. When a string
matching any of these three reserved keywords is found, then the appropriate
token code is returned. It is vital that the token codes that are returned when a
token is matched are identical to those expected by the parser. If they are not,
then the parser will not see the same token sequence produced by the scanner.
This will cause the parser to generate false syntax errors based on the incorrect
token stream it sees.
It is standard for the scanner and parser to share the definition of token
codes to guarantee that consistent values are seen by both. The file y.tab.h,
++
 
 
Search WWH ::




Custom Search