Java Reference
In-Depth Information
if it is not already there. Whether the identifier is entered or is already in the
table, a pointer to the symbol table entry is then returned from the scanner.
In block-structured languages, the scanner generally is not expected to
enter or look up identifiers in the symbol table because an identifier can be
used in many contexts (for example, as a variable, member of a class, or label).
The scanner usually cannot know when an identifier should be entered into
the symbol table for the current scope or when it should return a pointer to an
instance from an earlier scope. Some scanners just copy the identifier into a
private string variable (that cannot be overwritten) and return a pointer to it. A
later compiler phase, the type checker, then resolves the identifier's intended
usage.
Sometimes a string space is used to store identifiers in conjunction with a
symbol table (see Chapter 8). A string space is an extendable block of memory
used to store the text of identifiers. A string space eliminates frequent calls to
memory allocators such asnewor malloc. It also avoids the space overhead of
storing multiple copies of the same string. The scanner can enter an identifier
into the string space and return a pointer into the string space rather than the
actual text.
An alternative to a string space is a hash table that stores identifiers and
assigns to each a unique serial number . A serial number is a small integer that
can be used instead of a string space pointer. All identifiers that have the same
text get the same serial number; identifiers with di
erent
serial numbers. Serial numbers are ideal indices into symbol tables (which
need not be hashed) because they are small, contiguously assigned integers. A
scanner can hash an identifier when it is scanned and return its serial number
as part of the identifier token.
In some languages, such as C, C
ff
erent texts get di
ff
, and Java, case is significant; in others,
such as Ada and Pascal, it is not. When case is significant, identifier text
must be stored or returned exactly as it was scanned. Reserved word lookup
must distinguish between identifiers and reserved words that di
++
ff
er only in
case. However, when case is insignificant, case di
erences in the spelling of an
identifier or reserved word must be guaranteed to not cause errors. This can
be done by putting all tokens scanned as identifiers into a uniform case before
they are returned or looked up in a reserved word table.
Other tokens, such as literals, require processing before they are returned.
Integer and real (floating) literals are converted to numeric form and returned
as part of the token. Numeric conversion can be tricky because of the danger
of overflow or roundo
ff
errors. It is wise to use standard library routines such
as atoi and atof (in C) (Integer.intValueand Float.floatValue in Java).
For string literals, a pointer to the text of the string (with escaped characters
expanded) should be returned.
The design of C contains a flaw that requires a C scanner to do a bit of
special processing. The character sequencea(*b);can be a call to procedure
ff
 
Search WWH ::




Custom Search