Java Reference
In-Depth Information
Terminal Regular Expression
floatdcl
"f"
intdcl
"i"
print
"p"
id
[a
e]
|
[g
h]
|
[j
o]
|
[q
z]
assign
"="
plus
"+"
minus
"-"
9] +
inum
[0
9] + .[0
9] +
fnum
[0
("") +
blank
Figure 2.3: Formal definition of ac tokens.
2.2.2 Token Specification
Thus far, a CFG formally defines the sequences of terminal symbols that com-
prise a language. The actual input characters that could correspond to each
terminal symbol must also be specified. The ac grammar in Figure 2.1 uses the
assign symbol as a terminal, but that symbol will appear in the input stream
as the = character. The terminal id could be any alphabetic character except
f, i,orp,whichare reserved for special use in ac. In most programming
languages, the strings that could correspond to an id are practically unlimited,
and tokens such as if and while are often reserved keywords.
In addition to the grammar's terminal symbols, language definitions often
include elements such as comments, blank space, and compilation directives
that must be properly recognized as tokens in the input stream. The formal
specification of a language's tokens is typically accomplished by associating a
regular expression with each token, as shown in Figure 2.3. A full treatment
of regular expressions can be found in Section 3.2 on page 60.
The specification in Figure 2.3 beginswith rules for the language's reserved
keywords: f, i,andp.Thespecificationforid uses the
symbol to specify the
union of four sets, each a range of characters, so that an id is any lower case
alphabetic character not already reserved. The specification for inum allows
one or more decimal digits. An fnum is like an inum except that it is followed
by a decimal point and then one or more digits.
|
Figure 2.4 illustrates an application of the ac specification to the input
stream shown at the bottom. The tokens corresponding to the input stream
are shown just above the input stream. To save space, the blank tokens are not
shown.
 
 
Search WWH ::




Custom Search