Parsing - Introduction to Compiler Construction in a Java World

Java Reference

In-Depth Information

If unambiguous grammars are so important, why do we see so many languages having

this ambiguous conditional statement? All of j--, Java, C, and C ++ include precisely this

same ambiguous conditional construct. One could easily modify the syntax of conditionals

to remove the ambiguity. For example, the programming languages S-Algol and Algol-S

define conditionals so:

S ::= if E do S

j if E then S else S

j s

E ::= e

(3.14)

Here, the do indicates the simpler conditional (having no else clause) while the then

indicates the presence of an else clause. A simple solution but it does change the language.

But programmers have become both accustomed to (and fond of) the ambiguous condi-

tional. So both programmers and compiler writers have learned to live with it. Programming

language reference manuals include extra explanations, such as In the conditional statement,

the else clause goes with the nearest preceding if .

And compiler writers handle the rule as a special case in the compiler's parser, making

sure that an else is grouped along with the closest preceding if .

The j-- grammar (and the Java grammar) have another ambiguity, which is even more

dicult. Consider the problem of parsing the expression

x.y.z.w

Clearly, w is a field; if it were a method expression, then that would be evident in the syntax:

x.y.z.w()

But, what about x.y.z ? There are several possibilities depending on the types of the

names x , y , and z .

If x is the name of a local variable, it might refer to an object with a field y , referring

to another object with a field z , referring to yet another object having our field w . In

this case, the expression would be parsed as a cascade of field selection expressions.

Alternatively, x.y might be the name of a package in which the class z is defined.

In this case, we would parse the expression by first parsing x.y.z as a fully qualified

class and then parse w as a static field selection of that class.

Other possibilities, parsing various permutations of (possibly qualified) class names

and field selection operations, also exist.

The parser cannot determine just how the expression x.y.z is parsed because types

are not decided until after it has parsed the program and constructed its abstract syntax

tree (AST). So the parser represents x.y.z in the AST by an AmbiguousName node. Later

on, after type declarations have been processed, the compiler rewrites this node as the

appropriate sub-tree.

These two examples of ambiguity in j-- (and in Java) make the point that in compiling

everyday programming languages, one must deal with messiness. One relies on formal tech-

niques where practical, because formal techniques lead to more correct compilers. But once

in awhile, formality fails and one must deal with special cases.

In general, there is no algorithm that can tell us whether or not an arbitrary grammar

is ambiguous. That is, ambiguity is not decidable. But there are large classes of grammars

(and so programming languages) that both can be shown to be decidable and for which

ecient parsers can be automatically constructed. These classes of grammars are large

Introduction to Compiler Construction in a Java World

Search WWH ::

Custom Search

Home