Error Detection in Operator Precedence Parsers
Errors can be detected in operator precedence parsers by making use of the blank entries in the operator precedence matrix. When such a blank entry is referenced in the parsing algorithm, a routine can be invoked that will report the error. The error message generated depends on the nature of the error.
Table 7-12 Allowable adjacent operator pairs
|
( |
I |
- |
/ |
) |
( 1 |
X |
X |
X |
X |
X |
- |
X |
X |
|||
/ |
X |
X |
|||
) |
X |
X |
X |
Appropriate and meaningful error messages can be created by a compiler writer who is very familiar with the programming-language syntax. Perhaps the reader should recall at this point the discussion of errors given in Chap. 5.
Errors can also be detected by constructing from the given grammar a table of allowable adjacent operator pairs. In the grammar
adjacent character pairs in the right-hand side of the rules can be used to generate the table. For example, the adjacent character pair E – (from the rule E -» E — T) indicates that the adjacent operator pairs i – and ) – are permitted. This follows by noting that the operators i and ) are rightmost derivable from E. Similarly, the character pair (E implies that the operator pairs (( and (/ are also allowable adjacent operator pairs since ( and i are leftmost operator derivable from E. When this process is continued, the set of allowable adjacent operator pairs given in Table 7-12 results. An algorithm to generate such a table is left as an exercise.
It should be noted that since relationships exist only between operator symbols, the phrase structure of the grammar involving nonterminals is essentially ignored. Consequently, nonterminals may not be used to indicate which portions of an input string are in error. Furthermore, since the parser produces only a skeleton parse for an input string, reductions of the form "A -> B" where B is nonterminal do not occur. Therefore, it is possible to accept an input string that is syntactically invalid. A final comment concerns parsers that use precedence functions instead of the operator precedence matrix. Parsers using these functions may take longer to detect an error than if the corresponding precedence matrix had been used. Examples of this possibility are given for simple precedence parsers in the next section.
SIMPLE PRECEDENCE GRAMMARS
Wirth and Weber (1966) alleviated some of the shortcomings of operator precedence grammars by generalizing the notion of precedence to all symbols (i.e., terminal and nonterminal) of a grammar. This generalization leads to the definition of simple precedence grammars. Unlike operator precedence parsers which produce only skeleton parses, simple precedence parsers produce complete parses. Also the phrase structure of a simple precedence grammar is captured in a simple precedence matrix because both terminal and nonterminal symbols can be related. Error detection and recovery can be more easily formalized in simple precedence parsers than is possible in operator precedence parsers.
Notions and Use of Precedence Relations
In this section we are again concerned with a left-to-right bottom-up parsing technique which involves no backup. This technique can be viewed as a generalization of the operator precedence parsing strategy. Unlike the operator precedence parser which detects a leftmost prime phrase, however, this method constructs the parse for a sentence by repeatedly finding the handle in each sentential form. As was previously mentioned, once the handle has been found, we must also determine which production to apply, that is, to which nonterminal the handle should be reduced. This parsing method will work for a class of grammars (which is a proper subset of the context-free grammars) called the simple precedence grammars. Since the proposed method proceeds in a left-to-right bottom-up manner, all the sentential forms will represent the converse of a rightmost derivation. In the remainder of the discussion on precedence grammars, all derivations are understood to be the converse of such a rightmost derivation.
Let us now investigate how the handle in the sentential form might be found. Since the parsing method is left to right, we want to scan the sentential form from left to right, looking at only two adjacent symbols at one time so as to be able to determine the rightmost symbol of the handle which is called its tail. Starting at the tail of the handle, and again using only two adjacent symbols, we will then scan from right to left and find the leftmost symbol in the handle which is called its head. We are then faced with the following problem. Given some sentential form
where … denotes a possibly empty string of symbols, four possibilities are obvious: Sj is the tail of the handle, both St and S2 are in the handle, S2 is in the handle but Sj is not, and neither S1 nor S2 is in the handle. This last case is impossible because the grammar is reduced and we are in the process of obtaining the converse of a rightmost derivation.
We would like to develop certain relations from the grammar which would permit us to make such a decision. For any pair of symbols and S2 in the vocabulary of the grammar, assume there is a sentential form … S^Sj… • At some stage in the reverse of the rightmost derivation, either S^ or S2 or both must be in the handle. The following three cases can occur:
1. Sj is part of the handle (actually its tail) and S2 is not. This relationship is denoted aswhich signifies that has precedence over S2 because must be reduced before S2. Formally, ifwhere /8 is the handle of a and there is a rulethen is the tail of /?.
2. 5X and S2 are both contained in the handle. This relationship is specified as
which means that both symbols have the same precedence and are to be reduced at the same time. This implies that there is a rule of the form in the grammar.
3. S2 is contained in the handle (actually its head) and is not. We denote such a relationship aswhich signifies that S2 has precedence over Sj. The grammar must contain a rule of the form
These three cases can be interpreted pictorially in Fig. 7-4a to c.
Figure 7-4 Interpretation of precedence relations.
As an example, consider the grammar
is the set of productions
and i represents a variable name. The language generated by this grammar is the set of parenthesized arithmetic expressions consisting of variables and the operations of addition and multiplication. Each column given in Fig. 7-5 represents a sentential form, a syntax tree for it, the handle of the sentential form, and the precedence relations that can be derived from the tree after repeated reductions. For example, the handle of the sentential formin Fig. 7-5 is /, and the relationshold. When the reductionis made, F is the handle of the sentential formwhich yields the relationsand
Continuing in this manner, we can obtain other relations between other pairs of symbols. A matrix which displays all the precedence relations for the example is given in Table 7-13, where a blank entry indicates that no relationship exists between a pair of symbols.
The approach used in Fig. 7-5 has yielded a number of relations for the symbols of the example grammar; however, it can be seen from the table that not all of them were obtained from the two trees considered. We could examine other syntax trees until all such relations were obtained. As was the case in operator precedence grammars these relations can also be obtained rather easily.
If at most one relation holds between any pair of symbols, the precedence relations can be used to determine the handle of a sentential form. If, however, more than one relation holds between a pair of symbols, this approach may not work. In the former case the handle of the sentential for S1S2 Sn is the leftmost substring S,…Sj such that
Table 7-13 Precedence relations lor the example grammar
E |
U |
T |
F |
V |
+ |
* |
i |
( |
) |
|
E |
||||||||||
U |
||||||||||
T |
||||||||||
F |
||||||||||
Table 7-14 Parse for the string i + i* i in the example grammar
Step |
Sentential form |
Handle |
Reduction |
Direct derivation obtained |
1 |
||||
2 |
||||
3 |
||||
4 |
||||
5 |
||||
6 |
||||
7 |
||||
8 |
||||
9 |
||||
10 |
||||
11 |
||||
As an example, Table 7-14 gives a parse for the sentence i + i * i. Each step in the table gives the sentential form along with the relations that exist between the symbols according to Table 7-13.
Formal Definition of Precedence Relations
In the previous subsection the precedence relations were defined in terms of syntax trees. We now redefine these relations in terms of the productions of the grammar. It will be shown that these relations can be computed easily from the grammar.
We first define two relations associated with any grammar that will be important when defining precedence relations. For some particular grammar and nonterminal U, the set of head symbols of the strings that can be produced by U will be required. The set of such symbols, head((/) is defined as
Since the mechanical evaluation of head(t/) may be cumbersome, we redefine it in terms of another relation over a finite set in the following manner. Let L be a relation over the vocabulary such that
ULX if there exists a production
The transitive closure (Sec. 2-1.2) of this relation can be obtained
UL+X iff there exists some sequence of rules (at least one) such that
It is clear that
The reflexive transitive closure L* can be defined as
In an analogous manner, the set of tail symbols tail(i/) is defined by
Another relation R over the vocabulary is defined as
URX if there exists a production
Again R+ and R* are defined in an obvious manner. These relations are easily computed using Warshall’s algorithm. The two basic relations can now be used to define the precedence relations.
Definition 7-4 The precedence relations for a grammar G are defined over its vocabulary as follows:
1.iff there exists a production
2.iff there exists a productionsuch that holds.
3.is a vocabulary symbol and there is production such thathold.
The relations L+ and R+ for the example grammar in the previous subsection are given in Table 7-15.
The relation = is easily obtained from the productions of’ the grammar. The relationcan be evaluated by first considering the symbol pairsand (E. The pairwill yield the relationsThe relations( are obtained from the term » F. Finally, the pair (E gives the relationsThe relation > is
computed in a similar manner. Using this procedure, Table 7-13 can be easily verified.
Table 7-15
Nonterminal symbol |
L+ |
R + |
E |
||
U |
||
T |
||
V |
||
F |
These precedence relations are actually equivalent to those defined in terms of handles in the previous subsection.
Theorem 7-2iff the substringappears in a handle of some sentential form.
Proof Let us first assume thatA syntax tree which has the substring 5XS2 in its handle must be constructed. By Definition 7-4(1) there exists a production of the formSince the grammar is assumed to be reduced, we havefor somewhere S is the starting symbol of the grammar. The desired syntax tree can then be constructed as follows:
1. Construct the syntax tree for the sentential form
2. Reduce the tree obtained in step 1 until U becomes part of the handle.
3. Modify the tree obtained in step 2 by extending the tree downward with the application of the production. Since U was in the handle of the previous tree, the handle of the presently constructed tree is «5,152/8 and we therefore have the required tree. Conversely, if S^Sj is a substring of the handle, then by definition of a handle there must exist a production
The proofs for the relationsare left as exercises.
The formal definitions of the three precedence relations given here are equivalent to those given informally in terms of syntax trees in Sec. 7-3.1. Some insight into these definitions can be obtained by looking at their interpretation from a syntax-tree point of view. A simple precedence grammar is defined as follows:
Definition 7-5 A grammar G is called a simple precedence grammar if the following conditions are satisfied:
1. For any pair of symbols in the vocabulary, at most one of the relations must hold.
2. No two productions can have the same right-hand side.
3. Empty rules are not allowed.
This class of grammars is called simple because only one symbol on each side of a possible handle is used to determine the presence of the handle. In other words, the handle is determined by scanning the sentential form from left to right examining successive pairs of adjacent symbols until > S2; in this case Sj is the tail of the handle and S2 is the right context symbol used to find it. Similarly, the head of the handle is found by scanning from right to left until < S2, where S2 is now the head of the handle and St is the left context symbol. So the first condition of the definition tells us how to find the handle while the second condition tells us which production to use.
Theorem 7-3 A simple precedence grammar is unambiguous. Furthermore, each sentential form SXS2…Sn has a unique handle which is the leftmost substring S,… Sj that satisfies
Note that we are assuming the presence of a special symbol # such that for any symbol in the original vocabulary.
Once the precedence relations for a simple precedence grammar have been determined, the parsing of a sentence is straightforward. This is one of the simplest classes of grammars that can be used in a practical manner in compiler writing. The relationsare easily computed from the basic relations L, R, and = . From the definition of the product of relations and the definition for , we can easily obtain
where ° denotes the product operator. Similarly, the relation > can be expressed as
where (R+)T is the transpose of R+. These relations can obviously be expressed as bit (Boolean) matrices; the logical capabilities of modern computers make their evaluation easy. Observe that the precedence matrix in the previous discussion has included columns which correspond to the nonterminal symbols of the grammar. The inclusion of these columns facilitates reasonable error recovery. We next turn to the formulation of a parsing algorithm for simple precedence grammars.
Parsing Algorithm for Simple Precedence Grammars
We wish to formulate a general parsing algorithm for any simple precedence grammar. The precedence matrix and the productions of the grammar are represented in some suitable form.
Basically the algorithm works in the following manner. The symbols of the given input string are scanned from left to right and placed on a stack S. This scanning continues until the relation between the symbol at the top of the stack and the next input symbol is >. At this point the top element of the stack denotes the tail of the handle. The head of the handle is determined by comparing successive pairs of adjacent symbols in the stack until a pair is encountered that is < related. This handle can then be reduced (i.e., removed from the stack) to the left part of a production (which is placed on the stack) whose right part is that handle. The entire process is repeated until the stack contains the starting symbol for the grammar and the next input symbol is the special symbol #.
Algorithm SP_PARSE. This is the general parsing algorithm for any simple precedence grammar. By using the precedence matrix and the productions for a particular grammar, this algorithm determines whether or not a given input sentence x,x2… xn is valid. The algorithm uses parse stack S, whose top entry is indicated by TOP. The current input symbol is stored in INPUT. The local variable i is used to find the head of the handle on the stack, and k gives the position of the current input symbol in the input sentence.
Table 7-16
Step |
Input string |
Stack contents |
Relation |
Input |
Handle |
0 |
|||||
1 |
|||||
2 |
|||||
3 |
|||||
4 |
|||||
5 |
|||||
6 |
|||||
7 |
|||||
8 |
|||||
9 |
|||||
10 |
|||||
11 |
|||||
12 |
|||||
13 |
|||||
14 |
|||||
15 |
A trace of this algorithm for the example grammar using the sentence i* i + i and Table 7-13 is given in Table 7-16.