Java Reference
In-Depth Information
if(ch=='/'){
nextCh();
if(ch=='/'){
//CharReadermapsallnewlinesto'\n'
while(ch!='\n'&&ch!=EOFCH){
nextCh();
}
}
else{
reportScannerError(''Operator/isnotsupportedinj--.'');
}
}
else{
moreWhiteSpace=false;
}
}
There are other kinds of tokens we must recognize as well, for example, String literals
and character literals. The code for recognizing all tokens appears in file Scanner.java ; the
principal method of interest is getNextToken() . This file is part of source code of the j--
compiler that we discussed in Chapter 1. At the end of this chapter you will find exercises
that ask you to modify this code (as well as that of other files) for adding tokens and other
functionality to our lexical analyzer.
A pertinent quality of the lexical analyzer described here is that it is hand-crafted.
Although writing a lexical analyzer by hand is relatively easy, particularly if it is based on
a state transition diagram, it is prone to error. In a later section we shall learn how we may
automatically produce a lexical analyzer from a notation based on regular expressions.
2.3 Regular Expressions
Regular expressions comprise a relatively simple notation for describing patterns of charac-
ters in text. For this reason, one finds them in text processing tools such as text editors. We
are interested in them here because they are also convenient for describing lexical tokens.
Definition 2.1. We say that a regular expression defines a language of strings over an
alphabet. Regular expressions may take one of the following forms:
1. If a is in our alphabet, then the regular expression a describes the language consisting
of the string a. We call this language L(a).
2. If r and s are regular expressions, then their concatenation rs is also a regular ex-
pression describing the language of all possible strings obtained by concatenating a
string in the language described by r, to a string in the language described by s. We
call this language L(rs).
3. If r and s are regular expressions, then the alternation rjs is also a regular expression
describing the language consisting of all strings described by either r or s. We call
this language L(rjs).
4. If r is a regular expression, the repetition 2 r is also a regular expression describing
the language consisting of strings obtained by concatenating zero or more instances
of strings described by r together. We call this language L(r).
2 Also known as theKleeneclosure.
 
Search WWH ::




Custom Search