Bottom-Up Parsing (Compiler Writing) Part 3

Formal Definition of Operator Precedence Relations

In the previous subsection the operator precedence relations were obtained by examining syntax trees. We now redefine these relations in terms of the productions of the grammar. It is shown that these relations can be computed easily from the grammar.

Definition 7-2 The operator precedence relations associated with an operator grammar are defined over its terminal alphabet as follows:

tmp245-93_thumb

The leftmost and rightmost terminal characters which are obtainable from the nonterminals for the example grammar of the previous subsection are given in Table 7-9. Formally, the leftmost and rightmost terminal-character derivations of a nonterminal symbol can be defined as follows:

tmp245-94_thumb


The relationtmp245-95_thumbis obtained easily from the grammar in that there is only one rule which contains two terminal symbols; thereforetmp245-96_thumbThe relationtmp245-97_thumbcan be evaluated by first considering the symbol pairstmp245-98_thumb

The pair (E yields the relationstmp245-99_thumband

tmp245-100_thumbThe remaining elements in thetmp245-101_thumbrelation are obtained in an analogous manner. Using a similar approach, thetmp245-102_thumbrelation can be constructed. In particular, we consider the symbol pairstmp245-103_thumbThe pair E) yields the elementstmp245-104_thumbThe remaining elements intmp245-123_thumbare generated by repeating this process on the other symbol pairs.

Table 7-9 Leftmost and rightmost character derivations for sample grammar

Nonterminal

Leftmost terminal

Rightmost terminal

symbol

character

character

P

tmp245-115 tmp245-116

F

tmp245-117 tmp245-118

T

tmp245-119 tmp245-120

E

tmp245-121 tmp245-122

One or more of these relations may hold between any pair of operator symbols. However, if at most one of the three relations holds between any such pair of symbols, an interesting and useful subset of the operator grammar class results.

Definition 7-3 An operator precedence grammar is an operator grammar in which at most one of the relationstmp245-124_thumbholds between any pair of operator symbols. We call the language generated by an operator precedence grammar an operator precedence language.

The precedence relations associated with our sample grammar indicate that this grammar is an operator precedence grammar.

The computation of the three precedence relations can be facilitated by introducing several intermediate relations which are represented by (Boolean) matrices. Since, however, the next subsection deals with a generalization of operator precedence to all symbols of the grammar, we leave the specification of the precedence relations in terms of simpler intermediate relations until later.

In this subsection we have formalized the notions of precedence relations introduced in the last subsection. If at most one operator relation holds between any pair of operator symbols for a given operator grammar, then this grammar is known as an operator precedence grammar. The next step is to specify an algorithm for the parsing of operator precedence grammars.

Operator Precedence Parsing Algorithm

In Sec. 7-2.1 we informally traced through a sample parse based on the operator relationstmp245-125_thumbRecall that the parse obtained was not a full parse but only a skeleton of the complete parse. This phenomenon was the result of not having any information in the precedence relations about the nonterminals of the grammar. Furthermore, the string reduced at each stage in the skeleton parse was the leftmost prime phrase and not the handle. Also recall that a prime phrase must contain at least one terminal symbol. The purpose of this subsection is to formalize this parsing process into the form of a general algorithm.

The leftmost prime phrase in a sentential form of an operator precedence grammar can be detected by using one symbol on each side of a possible prime phrase. In other words, the leftmost prime phrase is determined by scanning the sentential form from left to right examining successive pairs of terminal symbols untiltmp245-126_thumbin this case a is either the tail (last symbol) or the symbol which immediately precedes the tail of the prime phrase, and b is the right context symbol used to find it. Similarly, the head (leftmost symbol) of the prime phrase is found by scanning from the tail of the prime phrase from right to left until tmp245-127_thumbwhere b is now either the head or the symbol which immediately follows the head of the prime phrase and a is the left context symbol. The next theorem, which is stated without proof, is important.

Theorem 7-1 Each sentential form

tmp245-133_thumb

of an operator precedence grammar has a unique leftmost prime phrase which is the leftmost substring

tmp245-134_thumb

that satisfies

tmp245-135_thumb

Note that we are assuming (as we did in Sec. 7-2.1) the presence of a special symbol # such thattmp245-136_thumbfor any terminal symbol. This modification permits the detection of a leftmost prime phrase when it contains the leftmost and/or rightmost terminal in the sentential form.

We wish to formulate a parsing algorithm based on this theorem. We assume that the operator precedence matrix and the productions of the grammar are represented in some suitable form.

The algorithm works in the following manner. The symbols of the given input string are scanned from left to right and placed on a stack 5. This scanning continues until the relation between the symbol at the top or next to the top of the stack and the current input symbol istmp245-137_thumbNote that, since the relation is between terminal symbols only, we have to ignore the symbol on the top of the stack in the case of a nonterminal. In such a case, however, the next symbol in the stack must be a terminal (since in a sentential form of an operator precedence grammar two consecutive nonterminals cannot occur). At this point, the top element of the stack denotes the tail of the leftmost prime phrase. The head of tliis prime phrase is determined by comparing successive pairs of adjacent terminal symbols going down the stack until a pair is encountered that istmp245-138_thumb related. The symbol which immediately precedes the prime phrase is always a terminal. A pair of terminal symbols separated by a nonterminal symbol is considered to be an adjacent pair. This leftmost prime phrase is reduced (i.e., removed from the stack) to the left part of the applicable production (which is then placed on the stack). The entire process is repeated until the current input symbol is the special symbol #. This parsing approach is incorporated in the following algorithm.

Algorithm OPERATOFLPRECEDENCE. Given an operator precedence grammar, its associated precedence matrix, and a certain input sentence t — f1f2… tn which is to be parsed, this algorithm parses this sentence in the manner just described. Let S denote a stack, and let TOP be an index which is used to designate the top element of the stack. The index i is used to locate the leftmost prime phrase on the stack, while k is an index which refers to the current input symbol being scanned. NEXT is a temporary variable which contains the current input symbol. TEMP is a variable which is used in locating the head symbol of a prime phrase.

1. [Initialize]

tmp245-142_thumb

2. [Scan the first input symbol]

tmp245-143_thumb

3. [Parse the input string]

Repeat through step 5 while true

4. [Is stack top an operator?]

tmp245-144_thumb

5. [Tail of leftmost prime phrase?]

tmp245-145_thumb

then (Find the head of the leftmost prime phrase)

tmp245-146_thumb

The first step of the algorithm initializes the stack to contain the special symbol ‘#’. The input string is also padded with the marker symbol ‘#’. Step 2 scans the first input symbol and sets the input string index to a value of 2.

Steps 3 to 5 perform the required parse. The third step seems to contain an infinite loop. This loop is exited in step 5 when the stack contains two elements and the current"juput symbol is ‘#’. When this occurs, the parse is complete. Step 4 determines whether or not the top element in the stack is an operator. If it is not, the second element in the stack is an operator (since a sentential form cannot contain two consecutive nonterminals). The value of variable i reflects that fact. The purpose of step 5 is to determine the presence of a leftmost prime phrase. The presence of such a phrase results in its reduction. We detect a prime phrase by first determining its tail symbol. The conditiontmp245-147_thumbuses the right context symbol, NEXT (the current input symbol), to make this decision. If at this point in the parse a leftmost prime phrase is not in the stack, then we place the current input symbol on the stack and scan the next input symbol. If, however, the test confirms the presence of the leftmost prime phrase on the stack, we then must find the head symbol of this phrase. Such a search entails the checking of each successive pair of operator symbols in the stack. The desired head symbol is found on encountering the first operator pair that is related by the relationtmp245-148_thumbWhen this head symbol is detected, the required reduction is performed. This reduction entails the invocation of a semantic routine to handle that particular prime phrase. There is one semantic routine for each distinct prime phrase. We perform the indicated reduction by replacing the prime phrase in the top portion of the stack by some nonterminal symbol which we denote by ‘N’. Note that when one reduction occurs, it is possible to have other reductions apply at that point without any new input symbol being scanned. The repeat statement in step 5 handles this particular possibility. A trace of the skeleton parse for the input string f =(/+/)*/ of the sample language introduced earlier is given in Table 7-10.

Note that in the previous algorithm no attempt is made to detect and correct errors. These considerations will be discussed in Sec. 7-2.5.

We can represent the operator precedence relations by a precedence matrix P whose element p:j is given as tmp245-151_thumb

A 2-bit entry can represent each element of such a matrix. The zero entry in this matrix can be used in the case of error detection and recovery. This possibility is pursued in Sec. 7-2.5.

Operator precedence parsers have been used in several compiler efforts. Operator precedence grammars have been obtained for languages such as ALGOL 60 and ALGOL W. Another approach that has been used is to have an expression in a language recognized by an operator precedence parser submodule which is part of a recursive-descent parser.

Table 7-10 Trace of algorithm OPERATOR_PRECEDENCE for tmp245-152_thumb 

Unexpended

Stack

Current input

Leftmost

Step

input string

contents

Relation

symbol

prime phrase

0

tmp245-153 tmp245-154 tmp245-155 tmp245-156 tmp245-157

1

tmp245-158 tmp245-159 tmp245-160 tmp245-161 tmp245-162

2

tmp245-163 tmp245-164 tmp245-165 tmp245-166 tmp245-167

3

tmp245-168 tmp245-169 tmp245-170 tmp245-171 tmp245-172

4

tmp245-173 tmp245-174 tmp245-175 tmp245-176 tmp245-177

5

tmp245-178 tmp245-179 tmp245-180 tmp245-181 tmp245-182

6

tmp245-183 tmp245-184 tmp245-185 tmp245-186 tmp245-187

7

tmp245-188 tmp245-189 tmp245-190 tmp245-191 tmp245-192

8

tmp245-193 tmp245-194 tmp245-195 tmp245-196 tmp245-197

9

tmp245-198 tmp245-199 tmp245-200 tmp245-201 tmp245-202

10

tmp245-203 tmp245-204 tmp245-205 tmp245-206 tmp245-207

11

tmp245-208 tmp245-209 tmp245-210 tmp245-211 tmp245-212

12

tmp245-213 tmp245-214 tmp245-215 tmp245-216 tmp245-217

13

tmp245-218 tmp245-219 tmp245-220 tmp245-221 tmp245-222

14

tmp245-223 tmp245-224 tmp245-225 tmp245-226 tmp245-227

For a particular grammar, the size of the precedence matrix varies as the square of the number of operator symbols. In many cases, however, a great reduction in this matrix size can be realized. We explore this possibility in the next subsection.

Precedence Functions in Operator Precedence Parsers

The storage requirements of the precedence matrix require n2 entries where n is the number of operator symbols in the grammar. It may be desirable to try to reduce this number of entries. For some operator precedence grammars, it is sometimes possible o replace the matrix by two precedence functions. In such cases these precedence functions, which we call / and g, are defined as

tmp245-228_thumb

where a and b denote operator symbols. The / and g functions can then be used instead of the precedence matrix in Algorithm OPERATOR_PRECEDENCE discussed earlier. Instead of using the relationtmp245-229_thumbto detect the tail symbol of the leftmost prime phrase on the stack, we use the relationtmp245-230_thumbSimilarly, instead of using the relationtmp245-233_thumbto detect the head symbol of the leftmost prime phrase, we use the relationtmp245-234_thumbThe precedence functions require 2« entries of storage.

We now present briefly in an informal manner a method for generating the precedence functions if they exist. A detailed discussion of the method is given in Sec. 7-3.4 for obtaining precedence functions for simple precedence grammars. In fact, obtaining precedence functions for operator precedence grammars is a special case of the method presented there.

To describe the method for obtaining the precedence functions, the following simplified grammar is used.

tmp245-237_thumb

Its precedence matrix is given in Table 7-11. One method of obtaining a pair of precedence functions involves the generation of a directed graph from the precedence matrix. The graph contains 2n entries labeledtmp245-238_thumb tmp245-239_thumbThe following notation is used in the construction of the graph:

tmp245-242_thumb

We define a directed edge fromtmp245-243_thumbSimilarly, define an edge fromtmp245-244_thumbAssign a number to each node equal to the number of nodes that are reachable from that node, including the node itself. The numbers assigned totmp245-245_thumbare the numbers assigned to their corresponding nodes.

In the current example, n = 5. The graph shown in Fig. 7-3 consists of 10

tmp245-249_thumb

totmp245-252_thumbThe value oftmp245-253_thumbis 2, sincetmp245-254_thumbare both reachable fromtmp245-255_thumbThe same value is assigned totmp245-256_thumbis assigned a value of 6, since 6 nodes are reachable from /(/’).

Table 7-11

Table 7-11

 

 

 

Directed graph for operator precedence matrix of Table 7-11.

Figure 7-3 Directed graph for operator precedence matrix of Table 7-11.

The other functional values can be obtained in an analogous manner. The two precedence functions thus obtained are:

(

i

/

)

/

2

6

4

6

6

g

7

7

3

5

2

It can be verified that these functions are valid by comparing them with the precedence matrix. The details of this check are given in Sec. 7-3.4.

So far, syntax errors have not been mentioned. The next subsection examines the detection of errors.

Next post:

Previous post: