Java Reference
In-Depth Information
3.5.4 Character Processing Using Lex
Although Lex is often used to produce scanners, it is really a general-purpose
character-processing tool, programmed using regular expressions. Lex pro-
vides no character-tossing mechanism because this would be too special pur-
pose. Wemay need toprocess the token text (stored inyytext) before returning
a token code. This is normally done by calling a subroutine in the command
associated with a regular expression. The definitions of such subroutines may
be placed in the final section of the Lex specification. For example, we might
want to call a subroutine to insert an identifier into a symbol table before it is
returned to the parser. For ac, the line
{Non_f_i_p}
{insert(yytext); return(ID);}
could do this, with insert defined in the final section. Alternatively, the
definition of insertcould be placed in a separate file containing symbol table
routines. This would allow insert to be changed and recompiled without
Lex's having to be rerun. (Some implementations of Lex generate scanners
rather slowly.)
In Lex, end-of-file is not handled by regular expressions. A predefined
EndFile token, with a token code of zero, is automatically returned when end-
of-file is reached at the beginning of a call to yylex().Itisuptotheparserto
recognize the zero return value as signifying the EndFile token.
If more than one source file must be scanned, this fact is hidden inside
the scanner mechanism. yylex()uses three user-defined functions to handle
character-level I
/
O:
input() Reads a single character; zero is returned on end-of-file.
output(c) Writes a single character to output.
unput(c) Puts a single character back into the input to be reread.
Whenyylex()encounters end-of-file, it calls a user-supplied integer function
named yywrap(). The purpose of this routine is to "wrap up" input process-
ing. It returns the value 1 if there is no more input. Otherwise, it returns zero
and arranges for input()to provide more characters.
The definitions for the input(), output(), unput(),andyywrap() func-
tions may be supplied by the compiler writer (usually as C macros). Lex
supplies default versions that read characters from the standard input and
write them to the standard output. The default version of yywrap() simply
returns1, thereby signifying that there is no more input. (The use of output()
allows Lex to be used as a tool for producing stand-alone data "filters" for
transforming a stream of data.)
Lex-generated scanners normally select the longest possible input se-
quence thatmatches some tokendefinition. Occasionally this can be a problem.
 
 
Search WWH ::




Custom Search