Bottom-Up Parsing (Compiler Writing) Part 2

OPERATOR PRECEDENCE GRAMMARS

In Sec. 7-1 the technique of establishing precedence relations between symbols was used to convert infix expressions to their suffix Polish equivalents. This conversion was based on the use of the input and stack precedence functions (/ and g functions, respectively).

This section describes one of the earliest formal parsing techniques which is based on the notion of operator precedence. This parsing method, which is due to Floyd (1963), involves establishing precedence relations between the operator (terminal) symbols of a grammar. A generalization of this method to involve precedence relations between all symbols (terminal and nonterminal) of a grammar is given in the next section.

Notions and Use of Operator Precedence Relations

In this subsection we generalize the idea of operator precedence which was introduced in the previous section. Through the use of syntax trees, we can obtain an informal understanding of this generalization. Throughout this section, the term "operator" is used to mean a member of the terminal alphabet, VT.

The parsing method that we present here constructs a parse in essentially a left-to-right manner. The parse thus obtained is not a converse rightmost derivation as described in Sec. 2-4.2. The method, however, is a bottom-up technique which involves no backup.

The generalization of the precedence functions introduced in the previous section results in precedence relations between operator (terminal) symbols only. This means that no relations concerning nonterminal symbols exist. Consequently, the parsing method does not repeatedly locate the handle of each sentential form in the parse. Since the relations are only between operator symbols, the string of symbols reduced at each point in the parse must contain at least one operator symbol.


We now define the class of grammars which is the subject of the discussion in this section.

Definition 7-1 A grammar G is an operator grammar if it contains no rules of the form V -> aXYfi, where XJe VN\ that is, the occurrence of consecutive nonterminals in the right part of any rule is forbidden. Also, empty rules are not permitted. We call the language generated by an operator grammar an operator language.

The following is an example of an operator grammar in which the start symbol is E:

tmp245-1_thumb

The operator language associated with this grammar is the set of parenthesized arithmetic expressions consisting of variables and the operations of addition, subtraction, multiplication, division, and exponentiation.

Recall from Sec. 2-4.1 the definition of a phrase associated with a sentential form. A prime phrase is a phrase which must contain at least one terminal symbol of the grammar, but no prime phrase other than itself. For example, in the current example grammar, the sentential form

tmp245-2_thumb

contains only the prime phrases P * P and i. All other phrases either are nonterminals or contain one of these prime phrases. At each stage in the parse we want to reduce the leftmost prime phrase. This is why this method of parsing is classified as a essentially left-to-right method.

The previous restriction on grammatical rules in an operator grammar gives rise to important properties concerning its sentential forms. In particular, it can be shown that any sentential form in an operator grammar cannot contain two consecutive nonterminals. The proof of this fact is left as an exercise. (The proof is by induction on the length of the derivation.)

The use of syntax trees can yield the relationships between operator symbols. Each column given in Fig. 7-1 represents a sentential form, a syntax tree for it, the leftmost prime phrase of this sentential form, and the precedence relations that can be derived from the tree after repeated reductions. For example, the leftmost prime phrase of the sentential formtmp245-3_thumbin Fig. 7-1 is the leftmost i, and the relationstmp245-4_thumbhold. By the relationtmp245-5_thumbwe mean that i will be reduced before *. Similarly,tmp245-6_thumbdenotes that / will be reduced before +.

When the reduction of i to P is made, i becomes the leftmost prime phrase of the sentential formtmp245-7_thumbwhich yields the relationstmp245-8_thumbContinuing in this manner, we can obtain other relations between other pairs of operator symbols. A matrix which displays all the operator precedence relations for the example grammar is given in Table 7-7, where a blank entry indicates that no relationship exists between a pair of operator symbols. Note that in the tabletmp245-9_thumbdenotes that both (and) will be reduced together.

The approach used in Fig. 7-1 yielded some relations for some of the operator symbols in the example grammar; however, it is clear from the table that not all relationships were obtained from the two trees considered. We could examine other syntax trees until all such relations were obtained. We shall show in the next subsection that all these relations can be obtained rather easily.

The obvious question which arises at this point is: How do these relations help us to obtain the parse of a sentence? It turns out that if at most one relation holds between any pair of terminal symbols, the operator precedence relations can be used to determine the leftmost prime phrase of a sentential form. If, however, more than one relation holds between a pair of operator symbols, this approach may not work.

Table 7-7 Operator precedence relations for sample grammar

Operator precedence relations for sample grammar

 

 

 

tmp245-18Obtaining precedence relations from syntax trees.

Figure 7-1 Obtaining precedence relations from syntax trees.

Assuming that at most one relation holds between any pair of terminal symbols, there is a straightforward technique for finding the leftmost prime phrase of each sentential form. Let a general sentential form be written as

tmp245-19_thumb

wheretmp245-20_thumbThe term [A,] denotes that the nonterminal At can be present or absent in the sentential form. Note, also, that from the previous discussion two adjacent nonterminals cannot occur in the given sentential form. The leftmost prime phrase of this sentential form is the leftmost substring satisfying the conditions

tmp245-22_thumb

More will be said about this fact in the next subsection. The detection of the leftmost prime phrase c,…cj is based on the context symbols cl_1 and cJ + l, where c,_1 denotes the left context and cJ+1 denotes the right context. In certain instances, we may not have any left and/or right context symbols. In order to prevent this problem, we assume the existence of the special character # such that

tmp245-23_thumb

A given sentential form s is surrounded by instances of the special character, giving the string #s#. We are now guaranteed the existence of left and right context symbols.

Using the example grammar introduced earlier, let us now, in an informal manner, attempt to construct a syntax tree for the sentence /*(; + /). Padded with #’s on both ends, the sentence is #/*(/ + /)#. Scanning this augmented sentence from left to right and using operator precedence relations, it is seen from Table 7-8 that

tmp245-24_thumb

Therefore, / is the leftmost prime phrase of the initial sentential form. Since the precedence table gives no information about nonterminals per se, we are unable to determine which nonterminal we must reduce the phrase to. Consequently the prime phrase i is replaced by the nonterminal symbol Xv This substitution yields the new sentential form

tmp245-25_thumb

By continuing the scan, we obtain the relations

tmp245-26_thumb

This indicates that the leftmost / is a prime phrase. We then reduce this phrase to a nonterminal X2. After this reduction we obtain the revised sentential form

tmp245-27_thumb

The continuation of the scan yields the relations

tmp245-28_thumb

From these relationships it is clear that / is again a prime phrase. This phrase reduces to the nonterminal X3, and the resulting sentential form is

tmp245-29_thumb

At this point

tmp245-30_thumb

so X2 + X3 is :> prime phrase. A reduction of this phrase to X4 produces the new sentential form

tmp245-31_thumb

The relations which now hold are

tmp245-32_thumb

Consequently, we replace the prime phrase by the nonterminal X5. This substitution results in the sentential form

tmp245-33_thumb

Table 7-8 Parse of tmp245-34_thumb 

Step

Sentential form

Leftmost prime phrase

Reduction

tmp245-35 tmp245-36 tmp245-37 tmp245-38 tmp245-39 tmp245-40
tmp245-41 tmp245-42 tmp245-43 tmp245-44 tmp245-45 tmp245-46 tmp245-47
tmp245-48 tmp245-49 tmp245-50 tmp245-51 tmp245-52 tmp245-53 tmp245-54
tmp245-55 tmp245-56 tmp245-57 tmp245-58 tmp245-59 tmp245-60 tmp245-61 tmp245-62
tmp245-63 tmp245-64 tmp245-65 tmp245-66 tmp245-67 tmp245-68 tmp245-69 tmp245-70
tmp245-71 tmp245-72 tmp245-73 tmp245-74 tmp245-75 tmp245-76 tmp245-77 tmp245-78
tmp245-79 tmp245-80 tmp245-81 tmp245-82 tmp245-83 tmp245-84 tmp245-85 tmp245-86

Since

tmp245-87_thumb

a final reduction of the prime phrase ^ ♦ X5 to X6 yields the sentential form

tmp245-88_thumb

The reader probably has noticed that the parse of the given sentence constructed in the previous fnanner is not complete. The syntax tree obtained so far is displayed in Fig. 7-2a, with a summary of the parse given in Table 7-8. This tree is merely a skeleton of the complete syntax tree given in Fig. 7-2b.

The absence of certain portions of the complete derivation is obvious when both trees are compared. The omitted portions of the derivations consist of sentential forms where one nonterminal has been reduced to another nonterminal. Since the precedence relations exist only between terminal symbols, it is not surprising that these omissions occur. Clearly, we can complete the derivation by filling in the gaps; namely, we must complete the following:

tmp245-89_thumb

 

 

Parse trees for the sentence

Figure 7-2 Parse trees for the sentence tmp245-91_thumb

In this example,

tmp245-92_thumb

These partial derivations are obtained easily from the rules of the grammar.

The approach taken in this subsection depends exclusively on the precedence table associated with a grammar. The precedence table for our sample grammar was obtained in an ad hoc manner. It would be desirable to be able to generate such a precedence table in a more formal manner. In fact, this goal can be achieved mechanically by examining the rules of the grammar. We turn now to this formalization.

Next post:

Previous post: