Java Reference
In-Depth Information
D
D
D
.
D
.
.
Figure 3.14: An FA that scans integer and real literals and the
subrange operator.
appears too soon or too late, the parser can perform error repair or issue a
suitable error message.
3.7.4 Multicharacter Lookahead
We can generalize FAs to look ahead beyond the next input character. This
feature is important for implementing a scanner for Fortran. In Fortran, the
statement DO 10 J = 1,100 specifies a loop, with index J ranging from 1
to 100. In contrast, the statement DO 10 J = 1.100 is an assignment to the
variable DO10J. In Fortran, blanks are not significant except in strings. A
Fortran scanner can determine whether the O is the last character of aDO token
only after reading as far as the comma (or period). (In fact, the erroneous
substitution of a . for a , in a Fortran DO loop once caused a 1960s-era space
launch to fail! Because the substitution resulted in a valid statement, the error
was not detected until runtime, which in this case was after the rocket had been
launched. The rocket deviated from course and had to be destroyed.)
We have already shown you a milder form of the extended lookahead
problem that occurs in Pascal and Ada. Scanning, for example, 10..100
requires two-character lookahead after the10. Using the FA of Figure 3.14 and
given 10..100, we would scan three characters and stop in a nonaccepting
state. Whenever we stop reading in a nonaccepting state, we can back up over
accepted characters until an accepting state is found. Characters over which
we back up are rescanned to form later tokens. If no accepting state is reached
during backup, then we have a lexical error and invoke lexical error recovery.
In Pascal or Ada, more than two-character lookahead is not needed; this
simplifies the bu
ff
ering of characters to be rescanned. Alternatively, we can
 
 
Search WWH ::




Custom Search