Formal Definition of Operator Precedence Relations
In the previous subsection the operator precedence relations were obtained by examining syntax trees. We now redefine these relations in terms of the productions of the grammar. It is shown that these relations can be computed easily from the grammar.
Definition 7-2 The operator precedence relations associated with an operator grammar are defined over its terminal alphabet as follows:
The leftmost and rightmost terminal characters which are obtainable from the nonterminals for the example grammar of the previous subsection are given in Table 7-9. Formally, the leftmost and rightmost terminal-character derivations of a nonterminal symbol can be defined as follows:
The relationis obtained easily from the grammar in that there is only one rule which contains two terminal symbols; thereforeThe relationcan be evaluated by first considering the symbol pairs
The pair (E yields the relationsand
The remaining elements in therelation are obtained in an analogous manner. Using a similar approach, therelation can be constructed. In particular, we consider the symbol pairsThe pair E) yields the elementsThe remaining elements inare generated by repeating this process on the other symbol pairs.
Table 7-9 Leftmost and rightmost character derivations for sample grammar
Nonterminal |
Leftmost terminal |
Rightmost terminal |
symbol |
character |
character |
P |
||
F |
||
T |
||
E |
One or more of these relations may hold between any pair of operator symbols. However, if at most one of the three relations holds between any such pair of symbols, an interesting and useful subset of the operator grammar class results.
Definition 7-3 An operator precedence grammar is an operator grammar in which at most one of the relationsholds between any pair of operator symbols. We call the language generated by an operator precedence grammar an operator precedence language.
The precedence relations associated with our sample grammar indicate that this grammar is an operator precedence grammar.
The computation of the three precedence relations can be facilitated by introducing several intermediate relations which are represented by (Boolean) matrices. Since, however, the next subsection deals with a generalization of operator precedence to all symbols of the grammar, we leave the specification of the precedence relations in terms of simpler intermediate relations until later.
In this subsection we have formalized the notions of precedence relations introduced in the last subsection. If at most one operator relation holds between any pair of operator symbols for a given operator grammar, then this grammar is known as an operator precedence grammar. The next step is to specify an algorithm for the parsing of operator precedence grammars.
Operator Precedence Parsing Algorithm
In Sec. 7-2.1 we informally traced through a sample parse based on the operator relationsRecall that the parse obtained was not a full parse but only a skeleton of the complete parse. This phenomenon was the result of not having any information in the precedence relations about the nonterminals of the grammar. Furthermore, the string reduced at each stage in the skeleton parse was the leftmost prime phrase and not the handle. Also recall that a prime phrase must contain at least one terminal symbol. The purpose of this subsection is to formalize this parsing process into the form of a general algorithm.
The leftmost prime phrase in a sentential form of an operator precedence grammar can be detected by using one symbol on each side of a possible prime phrase. In other words, the leftmost prime phrase is determined by scanning the sentential form from left to right examining successive pairs of terminal symbols untilin this case a is either the tail (last symbol) or the symbol which immediately precedes the tail of the prime phrase, and b is the right context symbol used to find it. Similarly, the head (leftmost symbol) of the prime phrase is found by scanning from the tail of the prime phrase from right to left until where b is now either the head or the symbol which immediately follows the head of the prime phrase and a is the left context symbol. The next theorem, which is stated without proof, is important.
Theorem 7-1 Each sentential form
of an operator precedence grammar has a unique leftmost prime phrase which is the leftmost substring
that satisfies
Note that we are assuming (as we did in Sec. 7-2.1) the presence of a special symbol # such thatfor any terminal symbol. This modification permits the detection of a leftmost prime phrase when it contains the leftmost and/or rightmost terminal in the sentential form.
We wish to formulate a parsing algorithm based on this theorem. We assume that the operator precedence matrix and the productions of the grammar are represented in some suitable form.
The algorithm works in the following manner. The symbols of the given input string are scanned from left to right and placed on a stack 5. This scanning continues until the relation between the symbol at the top or next to the top of the stack and the current input symbol isNote that, since the relation is between terminal symbols only, we have to ignore the symbol on the top of the stack in the case of a nonterminal. In such a case, however, the next symbol in the stack must be a terminal (since in a sentential form of an operator precedence grammar two consecutive nonterminals cannot occur). At this point, the top element of the stack denotes the tail of the leftmost prime phrase. The head of tliis prime phrase is determined by comparing successive pairs of adjacent terminal symbols going down the stack until a pair is encountered that is related. The symbol which immediately precedes the prime phrase is always a terminal. A pair of terminal symbols separated by a nonterminal symbol is considered to be an adjacent pair. This leftmost prime phrase is reduced (i.e., removed from the stack) to the left part of the applicable production (which is then placed on the stack). The entire process is repeated until the current input symbol is the special symbol #. This parsing approach is incorporated in the following algorithm.
Algorithm OPERATOFLPRECEDENCE. Given an operator precedence grammar, its associated precedence matrix, and a certain input sentence t — f1f2… tn which is to be parsed, this algorithm parses this sentence in the manner just described. Let S denote a stack, and let TOP be an index which is used to designate the top element of the stack. The index i is used to locate the leftmost prime phrase on the stack, while k is an index which refers to the current input symbol being scanned. NEXT is a temporary variable which contains the current input symbol. TEMP is a variable which is used in locating the head symbol of a prime phrase.
1. [Initialize]
2. [Scan the first input symbol]
3. [Parse the input string]
Repeat through step 5 while true
4. [Is stack top an operator?]
5. [Tail of leftmost prime phrase?]
then (Find the head of the leftmost prime phrase)
The first step of the algorithm initializes the stack to contain the special symbol ‘#’. The input string is also padded with the marker symbol ‘#’. Step 2 scans the first input symbol and sets the input string index to a value of 2.
Steps 3 to 5 perform the required parse. The third step seems to contain an infinite loop. This loop is exited in step 5 when the stack contains two elements and the current"juput symbol is ‘#’. When this occurs, the parse is complete. Step 4 determines whether or not the top element in the stack is an operator. If it is not, the second element in the stack is an operator (since a sentential form cannot contain two consecutive nonterminals). The value of variable i reflects that fact. The purpose of step 5 is to determine the presence of a leftmost prime phrase. The presence of such a phrase results in its reduction. We detect a prime phrase by first determining its tail symbol. The conditionuses the right context symbol, NEXT (the current input symbol), to make this decision. If at this point in the parse a leftmost prime phrase is not in the stack, then we place the current input symbol on the stack and scan the next input symbol. If, however, the test confirms the presence of the leftmost prime phrase on the stack, we then must find the head symbol of this phrase. Such a search entails the checking of each successive pair of operator symbols in the stack. The desired head symbol is found on encountering the first operator pair that is related by the relationWhen this head symbol is detected, the required reduction is performed. This reduction entails the invocation of a semantic routine to handle that particular prime phrase. There is one semantic routine for each distinct prime phrase. We perform the indicated reduction by replacing the prime phrase in the top portion of the stack by some nonterminal symbol which we denote by ‘N’. Note that when one reduction occurs, it is possible to have other reductions apply at that point without any new input symbol being scanned. The repeat statement in step 5 handles this particular possibility. A trace of the skeleton parse for the input string f =(/+/)*/ of the sample language introduced earlier is given in Table 7-10.
Note that in the previous algorithm no attempt is made to detect and correct errors. These considerations will be discussed in Sec. 7-2.5.
We can represent the operator precedence relations by a precedence matrix P whose element p:j is given as
A 2-bit entry can represent each element of such a matrix. The zero entry in this matrix can be used in the case of error detection and recovery. This possibility is pursued in Sec. 7-2.5.
Operator precedence parsers have been used in several compiler efforts. Operator precedence grammars have been obtained for languages such as ALGOL 60 and ALGOL W. Another approach that has been used is to have an expression in a language recognized by an operator precedence parser submodule which is part of a recursive-descent parser.
Table 7-10 Trace of algorithm OPERATOR_PRECEDENCE for
|
Unexpended |
Stack |
|
Current input |
Leftmost |
Step |
input string |
contents |
Relation |
symbol |
prime phrase |
0 |
|||||
1 |
|||||
2 |
|||||
3 |
|||||
4 |
|||||
5 |
|||||
6 |
|||||
7 |
|||||
8 |
|||||
9 |
|||||
10 |
|||||
11 |
|||||
12 |
|||||
13 |
|||||
14 |
For a particular grammar, the size of the precedence matrix varies as the square of the number of operator symbols. In many cases, however, a great reduction in this matrix size can be realized. We explore this possibility in the next subsection.
Precedence Functions in Operator Precedence Parsers
The storage requirements of the precedence matrix require n2 entries where n is the number of operator symbols in the grammar. It may be desirable to try to reduce this number of entries. For some operator precedence grammars, it is sometimes possible o replace the matrix by two precedence functions. In such cases these precedence functions, which we call / and g, are defined as
where a and b denote operator symbols. The / and g functions can then be used instead of the precedence matrix in Algorithm OPERATOR_PRECEDENCE discussed earlier. Instead of using the relationto detect the tail symbol of the leftmost prime phrase on the stack, we use the relationSimilarly, instead of using the relationto detect the head symbol of the leftmost prime phrase, we use the relationThe precedence functions require 2« entries of storage.
We now present briefly in an informal manner a method for generating the precedence functions if they exist. A detailed discussion of the method is given in Sec. 7-3.4 for obtaining precedence functions for simple precedence grammars. In fact, obtaining precedence functions for operator precedence grammars is a special case of the method presented there.
To describe the method for obtaining the precedence functions, the following simplified grammar is used.
Its precedence matrix is given in Table 7-11. One method of obtaining a pair of precedence functions involves the generation of a directed graph from the precedence matrix. The graph contains 2n entries labeled The following notation is used in the construction of the graph:
We define a directed edge fromSimilarly, define an edge fromAssign a number to each node equal to the number of nodes that are reachable from that node, including the node itself. The numbers assigned toare the numbers assigned to their corresponding nodes.
In the current example, n = 5. The graph shown in Fig. 7-3 consists of 10
toThe value ofis 2, sinceare both reachable fromThe same value is assigned tois assigned a value of 6, since 6 nodes are reachable from /(/’).
Table 7-11
Figure 7-3 Directed graph for operator precedence matrix of Table 7-11.
The other functional values can be obtained in an analogous manner. The two precedence functions thus obtained are:
|
( |
i |
— |
/ |
) |
/ |
2 |
6 |
4 |
6 |
6 |
g |
7 |
7 |
3 |
5 |
2 |
It can be verified that these functions are valid by comparing them with the precedence matrix. The details of this check are given in Sec. 7-3.4.
So far, syntax errors have not been mentioned. The next subsection examines the detection of errors.