OPERATOR PRECEDENCE GRAMMARS
In Sec. 7-1 the technique of establishing precedence relations between symbols was used to convert infix expressions to their suffix Polish equivalents. This conversion was based on the use of the input and stack precedence functions (/ and g functions, respectively).
This section describes one of the earliest formal parsing techniques which is based on the notion of operator precedence. This parsing method, which is due to Floyd (1963), involves establishing precedence relations between the operator (terminal) symbols of a grammar. A generalization of this method to involve precedence relations between all symbols (terminal and nonterminal) of a grammar is given in the next section.
Notions and Use of Operator Precedence Relations
In this subsection we generalize the idea of operator precedence which was introduced in the previous section. Through the use of syntax trees, we can obtain an informal understanding of this generalization. Throughout this section, the term "operator" is used to mean a member of the terminal alphabet, VT.
The parsing method that we present here constructs a parse in essentially a left-to-right manner. The parse thus obtained is not a converse rightmost derivation as described in Sec. 2-4.2. The method, however, is a bottom-up technique which involves no backup.
The generalization of the precedence functions introduced in the previous section results in precedence relations between operator (terminal) symbols only. This means that no relations concerning nonterminal symbols exist. Consequently, the parsing method does not repeatedly locate the handle of each sentential form in the parse. Since the relations are only between operator symbols, the string of symbols reduced at each point in the parse must contain at least one operator symbol.
We now define the class of grammars which is the subject of the discussion in this section.
Definition 7-1 A grammar G is an operator grammar if it contains no rules of the form V -> aXYfi, where XJe VN\ that is, the occurrence of consecutive nonterminals in the right part of any rule is forbidden. Also, empty rules are not permitted. We call the language generated by an operator grammar an operator language.
The following is an example of an operator grammar in which the start symbol is E:
The operator language associated with this grammar is the set of parenthesized arithmetic expressions consisting of variables and the operations of addition, subtraction, multiplication, division, and exponentiation.
Recall from Sec. 2-4.1 the definition of a phrase associated with a sentential form. A prime phrase is a phrase which must contain at least one terminal symbol of the grammar, but no prime phrase other than itself. For example, in the current example grammar, the sentential form
contains only the prime phrases P * P and i. All other phrases either are nonterminals or contain one of these prime phrases. At each stage in the parse we want to reduce the leftmost prime phrase. This is why this method of parsing is classified as a essentially left-to-right method.
The previous restriction on grammatical rules in an operator grammar gives rise to important properties concerning its sentential forms. In particular, it can be shown that any sentential form in an operator grammar cannot contain two consecutive nonterminals. The proof of this fact is left as an exercise. (The proof is by induction on the length of the derivation.)
The use of syntax trees can yield the relationships between operator symbols. Each column given in Fig. 7-1 represents a sentential form, a syntax tree for it, the leftmost prime phrase of this sentential form, and the precedence relations that can be derived from the tree after repeated reductions. For example, the leftmost prime phrase of the sentential formin Fig. 7-1 is the leftmost i, and the relationshold. By the relationwe mean that i will be reduced before *. Similarly,denotes that / will be reduced before +.
When the reduction of i to P is made, i becomes the leftmost prime phrase of the sentential formwhich yields the relationsContinuing in this manner, we can obtain other relations between other pairs of operator symbols. A matrix which displays all the operator precedence relations for the example grammar is given in Table 7-7, where a blank entry indicates that no relationship exists between a pair of operator symbols. Note that in the tabledenotes that both (and) will be reduced together.
The approach used in Fig. 7-1 yielded some relations for some of the operator symbols in the example grammar; however, it is clear from the table that not all relationships were obtained from the two trees considered. We could examine other syntax trees until all such relations were obtained. We shall show in the next subsection that all these relations can be obtained rather easily.
The obvious question which arises at this point is: How do these relations help us to obtain the parse of a sentence? It turns out that if at most one relation holds between any pair of terminal symbols, the operator precedence relations can be used to determine the leftmost prime phrase of a sentential form. If, however, more than one relation holds between a pair of operator symbols, this approach may not work.
Table 7-7 Operator precedence relations for sample grammar
Figure 7-1 Obtaining precedence relations from syntax trees.
Assuming that at most one relation holds between any pair of terminal symbols, there is a straightforward technique for finding the leftmost prime phrase of each sentential form. Let a general sentential form be written as
whereThe term [A,] denotes that the nonterminal At can be present or absent in the sentential form. Note, also, that from the previous discussion two adjacent nonterminals cannot occur in the given sentential form. The leftmost prime phrase of this sentential form is the leftmost substring satisfying the conditions
More will be said about this fact in the next subsection. The detection of the leftmost prime phrase c,…cj is based on the context symbols cl_1 and cJ + l, where c,_1 denotes the left context and cJ+1 denotes the right context. In certain instances, we may not have any left and/or right context symbols. In order to prevent this problem, we assume the existence of the special character # such that
A given sentential form s is surrounded by instances of the special character, giving the string #s#. We are now guaranteed the existence of left and right context symbols.
Using the example grammar introduced earlier, let us now, in an informal manner, attempt to construct a syntax tree for the sentence /*(; + /). Padded with #’s on both ends, the sentence is #/*(/ + /)#. Scanning this augmented sentence from left to right and using operator precedence relations, it is seen from Table 7-8 that
Therefore, / is the leftmost prime phrase of the initial sentential form. Since the precedence table gives no information about nonterminals per se, we are unable to determine which nonterminal we must reduce the phrase to. Consequently the prime phrase i is replaced by the nonterminal symbol Xv This substitution yields the new sentential form
By continuing the scan, we obtain the relations
This indicates that the leftmost / is a prime phrase. We then reduce this phrase to a nonterminal X2. After this reduction we obtain the revised sentential form
The continuation of the scan yields the relations
From these relationships it is clear that / is again a prime phrase. This phrase reduces to the nonterminal X3, and the resulting sentential form is
At this point
so X2 + X3 is :> prime phrase. A reduction of this phrase to X4 produces the new sentential form
The relations which now hold are
Consequently, we replace the prime phrase by the nonterminal X5. This substitution results in the sentential form
Step |
Sentential form |
|
|
Leftmost prime phrase |
Reduction |
||
Since
a final reduction of the prime phrase ^ ♦ X5 to X6 yields the sentential form
The reader probably has noticed that the parse of the given sentence constructed in the previous fnanner is not complete. The syntax tree obtained so far is displayed in Fig. 7-2a, with a summary of the parse given in Table 7-8. This tree is merely a skeleton of the complete syntax tree given in Fig. 7-2b.
The absence of certain portions of the complete derivation is obvious when both trees are compared. The omitted portions of the derivations consist of sentential forms where one nonterminal has been reduced to another nonterminal. Since the precedence relations exist only between terminal symbols, it is not surprising that these omissions occur. Clearly, we can complete the derivation by filling in the gaps; namely, we must complete the following:
Figure 7-2 Parse trees for the sentence
In this example,
These partial derivations are obtained easily from the rules of the grammar.
The approach taken in this subsection depends exclusively on the precedence table associated with a grammar. The precedence table for our sample grammar was obtained in an ad hoc manner. It would be desirable to be able to generate such a precedence table in a more formal manner. In fact, this goal can be achieved mechanically by examining the rules of the grammar. We turn now to this formalization.