Java Reference
In-Depth Information
declarations that are needed to allow the commands of section two to be
compiled. For example,
%{
#include "tokens.h"
%}
can include the definitions of token values returned when tokens are matched.
Lex's second section defines a table of regular expressions and correspond-
ing commands in C. The first blank or tab not escaped or not part of a quoted
string or character class is taken as the end of the regular expression. Thus,
one should avoid embedded blanks that are within regular expressions.
When an expression is matched, its associated command is executed. If
an input sequence matches no expression, then the sequence is simply copied
verbatim to the standard output file. Input that is matched is stored in a
global string variable yytext (whose length is yyleng). Commands may alter
yytext in any way. The default size of yytext is determined by YYLMAX,
which is initially defined to be 200. All tokens, even those that will be ignored
(such as comments), are stored in yytext. Hence, you may need to redefine
YYLMAX to avoid overflow. An alternative approach to scanning comments
that is not prone to the danger of overflowing yytextinvolves the use of start
conditions [LS83, Joh83]. Flex, an improved version of Lex discussed in the
next section, automatically extends the size of yytext when necessary. This
removes the danger that a very long token may overflow the text bu
er.
The content of yytextis overwritten as each new token is scanned. There-
fore, care must be taken to avoid returning the text of a token using a reference
into yytext. It is safer to copy the contents of yytext (e.g., using strcpy())
before the next call to yylex().
Lex allows regular expressions to overlap (that is, to match the same input
sequences). In the case of overlap, two rules are used to determine which
regular expression is matched:
ff
1. The longest possible match is performed. Lex automatically bu
ff
ers
characters while deciding how many characters can be matched.
2. If two expressions match exactly the same string, the earlier expression
(in order of definition in the Lex specification) is preferred.
Reserved words are often special cases of the pattern used for identifiers, so
their definitions are placed before the expression that defines an identifier
token. Often a "catchall"pattern is placed at the very end of section two. It is
used to catch characters that do not match any of the earlier patterns and hence
are probably erroneous. Recall that .matches any single character (other than
a newline). It is useful in a catchall pattern. However, avoid a pattern such as
.* because it will consume all characters up to the next newline.
 
Search WWH ::




Custom Search