Bottom-Up Parsing (Compiler Writing) Part 4

Error Detection in Operator Precedence Parsers

Errors can be detected in operator precedence parsers by making use of the blank entries in the operator precedence matrix. When such a blank entry is referenced in the parsing algorithm, a routine can be invoked that will report the error. The error message generated depends on the nature of the error.

Table 7-12 Allowable adjacent operator pairs

(

I

-

/

)

( 1

X

X

X

X

X

-

X

X

/

X

X

)

X

X

X

Appropriate and meaningful error messages can be created by a compiler writer who is very familiar with the programming-language syntax. Perhaps the reader should recall at this point the discussion of errors given in Chap. 5.


Errors can also be detected by constructing from the given grammar a table of allowable adjacent operator pairs. In the grammar

tmp245-262_thumb

adjacent character pairs in the right-hand side of the rules can be used to generate the table. For example, the adjacent character pair E – (from the rule E -» E — T) indicates that the adjacent operator pairs i – and ) – are permitted. This follows by noting that the operators i and ) are rightmost derivable from E. Similarly, the character pair (E implies that the operator pairs (( and (/ are also allowable adjacent operator pairs since ( and i are leftmost operator derivable from E. When this process is continued, the set of allowable adjacent operator pairs given in Table 7-12 results. An algorithm to generate such a table is left as an exercise.

It should be noted that since relationships exist only between operator symbols, the phrase structure of the grammar involving nonterminals is essentially ignored. Consequently, nonterminals may not be used to indicate which portions of an input string are in error. Furthermore, since the parser produces only a skeleton parse for an input string, reductions of the form "A -> B" where B is nonterminal do not occur. Therefore, it is possible to accept an input string that is syntactically invalid. A final comment concerns parsers that use precedence functions instead of the operator precedence matrix. Parsers using these functions may take longer to detect an error than if the corresponding precedence matrix had been used. Examples of this possibility are given for simple precedence parsers in the next section.

SIMPLE PRECEDENCE GRAMMARS

Wirth and Weber (1966) alleviated some of the shortcomings of operator precedence grammars by generalizing the notion of precedence to all symbols (i.e., terminal and nonterminal) of a grammar. This generalization leads to the definition of simple precedence grammars. Unlike operator precedence parsers which produce only skeleton parses, simple precedence parsers produce complete parses. Also the phrase structure of a simple precedence grammar is captured in a simple precedence matrix because both terminal and nonterminal symbols can be related. Error detection and recovery can be more easily formalized in simple precedence parsers than is possible in operator precedence parsers.

Notions and Use of Precedence Relations

In this section we are again concerned with a left-to-right bottom-up parsing technique which involves no backup. This technique can be viewed as a generalization of the operator precedence parsing strategy. Unlike the operator precedence parser which detects a leftmost prime phrase, however, this method constructs the parse for a sentence by repeatedly finding the handle in each sentential form. As was previously mentioned, once the handle has been found, we must also determine which production to apply, that is, to which nonterminal the handle should be reduced. This parsing method will work for a class of grammars (which is a proper subset of the context-free grammars) called the simple precedence grammars. Since the proposed method proceeds in a left-to-right bottom-up manner, all the sentential forms will represent the converse of a rightmost derivation. In the remainder of the discussion on precedence grammars, all derivations are understood to be the converse of such a rightmost derivation.

Let us now investigate how the handle in the sentential form might be found. Since the parsing method is left to right, we want to scan the sentential form from left to right, looking at only two adjacent symbols at one time so as to be able to determine the rightmost symbol of the handle which is called its tail. Starting at the tail of the handle, and again using only two adjacent symbols, we will then scan from right to left and find the leftmost symbol in the handle which is called its head. We are then faced with the following problem. Given some sentential form

tmp245-270_thumb

where … denotes a possibly empty string of symbols, four possibilities are obvious: Sj is the tail of the handle, both St and S2 are in the handle, S2 is in the handle but Sj is not, and neither S1 nor S2 is in the handle. This last case is impossible because the grammar is reduced and we are in the process of obtaining the converse of a rightmost derivation.

We would like to develop certain relations from the grammar which would permit us to make such a decision. For any pair of symbols and S2 in the vocabulary of the grammar, assume there is a sentential form … S^Sj… • At some stage in the reverse of the rightmost derivation, either S^ or S2 or both must be in the handle. The following three cases can occur:

1. Sj is part of the handle (actually its tail) and S2 is not. This relationship is denoted astmp245-271_thumbwhich signifies that has precedence over S2 because must be reduced before S2. Formally, iftmp245-272_thumbwhere /8 is the handle  of a and there is a ruletmp245-273_thumbthen is the tail of /?.

2. 5X and S2 are both contained in the handle. This relationship is specified as

tmp245-274_thumbwhich means that both symbols have the same precedence and are to be reduced at the same time. This implies that there is a rule of the form tmp245-275_thumbin the grammar.

3. S2 is contained in the handle (actually its head) and is not. We denote such a relationship astmp245-276_thumbwhich signifies that S2 has precedence over Sj. The grammar must contain a rule of the formtmp245-277_thumb

These three cases can be interpreted pictorially in Fig. 7-4a to c.

Interpretation of precedence relations.

Figure 7-4 Interpretation of precedence relations.

As an example, consider the grammar

tmp245-286_thumbtmp245-287_thumb

is the set of productions

tmp245-288_thumb

and i represents a variable name. The language generated by this grammar is the set of parenthesized arithmetic expressions consisting of variables and the operations of addition and multiplication. Each column given in Fig. 7-5 represents a sentential form, a syntax tree for it, the handle of the sentential form, and the precedence relations that can be derived from the tree after repeated reductions. For example, the handle of the sentential formtmp245-289_thumbin Fig. 7-5 is /, and the relationstmp245-290_thumbhold. When the reductiontmp245-291_thumbis made, F is the handle of the sentential formtmp245-292_thumbwhich yields the relationstmp245-293_thumband

tmp245-294_thumbContinuing in this manner, we can obtain other relations between other pairs of symbols. A matrix which displays all the precedence relations for the example is given in Table 7-13, where a blank entry indicates that no relationship exists between a pair of symbols.

The approach used in Fig. 7-5 has yielded a number of relations for the symbols of the example grammar; however, it can be seen from the table that not all of them were obtained from the two trees considered. We could examine other syntax trees until all such relations were obtained. As was the case in operator precedence grammars these relations can also be obtained rather easily.

If at most one relation holds between any pair of symbols, the precedence relations can be used to determine the handle of a sentential form. If, however, more than one relation holds between a pair of symbols, this approach may not work. In the former case the handle of the sentential for S1S2 Sn is the leftmost substring S,…Sj such that

tmp245-446_thumb

Table 7-13 Precedence relations lor the example grammar

E

U

T

F

V

+

*

i

(

)

E

tmp245-301 tmp245-302 tmp245-303 tmp245-304 tmp245-305 tmp245-306

U

tmp245-307 tmp245-308 tmp245-309 tmp245-310 tmp245-311 tmp245-312

T

tmp245-313 tmp245-314 tmp245-315 tmp245-316 tmp245-317 tmp245-318

F

tmp245-319 tmp245-320 tmp245-321 tmp245-322 tmp245-323 tmp245-324
tmp245-325 tmp245-326 tmp245-327 tmp245-328 tmp245-329 tmp245-330 tmp245-331 tmp245-332 tmp245-333 tmp245-334 tmp245-335
tmp245-336 tmp245-337 tmp245-338 tmp245-339 tmp245-340 tmp245-341 tmp245-342 tmp245-343 tmp245-344 tmp245-345 tmp245-346
tmp245-347 tmp245-348 tmp245-349 tmp245-350 tmp245-351 tmp245-352 tmp245-353 tmp245-354 tmp245-355 tmp245-356 tmp245-357

Table 7-14 Parse for the string i + i* i in the example grammar

Step

Sentential form

Handle

Reduction

Direct derivation obtained

1

tmp245-358 tmp245-359 tmp245-360 tmp245-361
tmp245-362 tmp245-363 tmp245-364 tmp245-365

2

tmp245-366 tmp245-367 tmp245-368 tmp245-369
tmp245-370 tmp245-371 tmp245-372 tmp245-373

3

tmp245-374 tmp245-375 tmp245-376 tmp245-377
tmp245-378 tmp245-379 tmp245-380 tmp245-381

4

tmp245-382 tmp245-383 tmp245-384 tmp245-385
tmp245-386 tmp245-387 tmp245-388 tmp245-389

5

tmp245-390 tmp245-391 tmp245-392 tmp245-393
tmp245-394 tmp245-395 tmp245-396 tmp245-397

6

tmp245-398 tmp245-399 tmp245-400 tmp245-401
tmp245-402 tmp245-403 tmp245-404 tmp245-405

7

tmp245-406 tmp245-407 tmp245-408 tmp245-409
tmp245-410 tmp245-411 tmp245-412 tmp245-413

8

tmp245-414 tmp245-415 tmp245-416 tmp245-417
tmp245-418 tmp245-419 tmp245-420 tmp245-421

9

tmp245-422 tmp245-423 tmp245-424 tmp245-425
tmp245-426 tmp245-427 tmp245-428 tmp245-429

10

tmp245-430 tmp245-431 tmp245-432 tmp245-433
tmp245-434 tmp245-435 tmp245-436 tmp245-437

11

tmp245-438 tmp245-439 tmp245-440 tmp245-441
tmp245-442 tmp245-443 tmp245-444 tmp245-445

As an example, Table 7-14 gives a parse for the sentence i + i * i. Each step in the table gives the sentential form along with the relations that exist between the symbols according to Table 7-13.

Formal Definition of Precedence Relations

In the previous subsection the precedence relations were defined in terms of syntax trees. We now redefine these relations in terms of the productions of the grammar. It will be shown that these relations can be computed easily from the grammar.

We first define two relations associated with any grammar that will be important when defining precedence relations. For some particular grammar and nonterminal U, the set of head symbols of the strings that can be produced by U will be required. The set of such symbols, head((/) is defined as

tmp245-448_thumb

Since the mechanical evaluation of head(t/) may be cumbersome, we redefine it in terms of another relation over a finite set in the following manner. Let L be a relation over the vocabulary such that

ULX if there exists a production

tmp245-449_thumb

The transitive closure (Sec. 2-1.2) of this relation can be obtained

UL+X iff there exists some sequence of rules (at least one) such that

tmp245-450_thumb

It is clear that

tmp245-451_thumb

The reflexive transitive closure L* can be defined as

tmp245-452_thumb

In an analogous manner, the set of tail symbols tail(i/) is defined by

tmp245-453_thumb

Another relation R over the vocabulary is defined as

URX if there exists a production

tmp245-454_thumb

Again R+ and R* are defined in an obvious manner. These relations are easily computed using Warshall’s algorithm. The two basic relations can now be used to define the precedence relations.

Definition 7-4 The precedence relations for a grammar G are defined over its vocabulary as follows:

1.tmp245-455_thumbiff there exists a productiontmp245-456_thumb

2.tmp245-457_thumbiff there exists a productiontmp245-458_thumbsuch that tmp245-459_thumbholds.

3.tmp245-460_thumbis a vocabulary symbol and there is production tmp245-461_thumbsuch thattmp245-462_thumbhold.

The relations L+ and R+ for the example grammar in the previous subsection are given in Table 7-15.

The relation = is easily obtained from the productions of’ the grammar. The relationtmp245-471_thumbcan be evaluated by first considering the symbol pairstmp245-472_thumband (E. The pairtmp245-473_thumbwill yield the relationstmp245-474_thumbThe relationstmp245-475_thumb( are obtained from the term » F. Finally, the pair (E gives the relationstmp245-476_thumbThe relation > is

computed in a similar manner. Using this procedure, Table 7-13 can be easily verified.

tmp245-477_thumbtmp245-478_thumbtmp245-479_thumbtmp245-480_thumbtmp245-481_thumbtmp245-482_thumb

Table 7-15

Nonterminal symbol

L+

R +

E

tmp245-483 tmp245-484

U

tmp245-485 tmp245-486

T

tmp245-487 tmp245-488

V

tmp245-489 tmp245-490

F

tmp245-491 tmp245-492

These precedence relations are actually equivalent to those defined in terms of handles in the previous subsection.

Theorem 7-2tmp245-493_thumbiff the substringtmp245-494_thumbappears in a handle of some sentential form.

Proof Let us first assume thattmp245-495_thumbA syntax tree which has the substring 5XS2 in its handle must be constructed. By Definition 7-4(1) there exists a production of the formtmp245-496_thumbSince the grammar is assumed to be reduced, we havetmp245-497_thumbfor sometmp245-498_thumbwhere S is the starting symbol of the grammar. The desired syntax tree can then be constructed as follows:

1. Construct the syntax tree for the sentential formtmp245-499_thumb

2. Reduce the tree obtained in step 1 until U becomes part of the handle.

3. Modify the tree obtained in step 2 by extending the tree downward with the application of the productiontmp245-500_thumb. Since U was in the handle of the previous tree, the handle of the presently constructed tree is «5,152/8 and we therefore have the required tree. Conversely, if S^Sj is a substring of the handle, then by definition of a handle there must exist a production

tmp245-501_thumb

The proofs for the relationstmp245-502_thumbare left as exercises.

The formal definitions of the three precedence relations given here are equivalent to those given informally in terms of syntax trees in Sec. 7-3.1. Some insight into these definitions can be obtained by looking at their interpretation from a syntax-tree point of view. A simple precedence grammar is defined as follows:

Definition 7-5 A grammar G is called a simple precedence grammar if the following conditions are satisfied:

1. For any pair of symbols in the vocabulary, at most one of the relations tmp245-503_thumbmust hold.

2. No two productions can have the same right-hand side.

3. Empty rules are not allowed.

This class of grammars is called simple because only one symbol on each side of a possible handle is used to determine the presence of the handle. In other words, the handle is determined by scanning the sentential form from left to right examining successive pairs of adjacent symbols until > S2; in this case Sj is the tail of the handle and S2 is the right context symbol used to find it. Similarly, the head of the handle is found by scanning from right to left until < S2, where S2 is now the head of the handle and St is the left context symbol. So the first condition of the definition tells us how to find the handle while the second condition tells us which production to use.

Theorem 7-3 A simple precedence grammar is unambiguous. Furthermore, each sentential form SXS2…Sn has a unique handle which is the leftmost substring S,… Sj that satisfies

tmp245-515_thumb

Note that we are assuming the presence of a special symbol # such that tmp245-516_thumbfor any symbol in the original vocabulary.

Once the precedence relations for a simple precedence grammar have been determined, the parsing of a sentence is straightforward. This is one of the simplest classes of grammars that can be used in a practical manner in compiler writing. The relationstmp245-517_thumbare easily computed from the basic relations L, R, and = . From the definition of the product of relations and the definition for tmp245-518_thumb, we can easily obtain

tmp245-522_thumb

where ° denotes the product operator. Similarly, the relation > can be expressed as

tmp245-523_thumb

where (R+)T is the transpose of R+. These relations can obviously be expressed as bit (Boolean) matrices; the logical capabilities of modern computers make their evaluation easy. Observe that the precedence matrix in the previous discussion has included columns which correspond to the nonterminal symbols of the grammar. The inclusion of these columns facilitates reasonable error recovery. We next turn to the formulation of a parsing algorithm for simple precedence grammars.

Parsing Algorithm for Simple Precedence Grammars

We wish to formulate a general parsing algorithm for any simple precedence grammar. The precedence matrix and the productions of the grammar are represented in some suitable form.

Basically the algorithm works in the following manner. The symbols of the given input string are scanned from left to right and placed on a stack S. This scanning continues until the relation between the symbol at the top of the stack and the next input symbol is >. At this point the top element of the stack denotes the tail of the handle. The head of the handle is determined by comparing successive pairs of adjacent symbols in the stack until a pair is encountered that is < related. This handle can then be reduced (i.e., removed from the stack) to the left part of a production (which is placed on the stack) whose right part is that handle. The entire process is repeated until the stack contains the starting symbol for the grammar and the next input symbol is the special symbol #.

Algorithm SP_PARSE. This is the general parsing algorithm for any simple precedence grammar. By using the precedence matrix and the productions for a particular grammar, this algorithm determines whether or not a given input sentence x,x2… xn is valid. The algorithm uses parse stack S, whose top entry is indicated by TOP. The current input symbol is stored in INPUT. The local variable i is used to find the head of the handle on the stack, and k gives the position of the current input symbol in the input sentence.

tmp245-524_thumb

Table 7-16

Step

Input string

Stack contents

Relation

Input

Handle

0

tmp245-525 tmp245-526 tmp245-527 tmp245-528 tmp245-529

1

tmp245-530 tmp245-531 tmp245-532 tmp245-533 tmp245-534

2

tmp245-535 tmp245-536 tmp245-537 tmp245-538 tmp245-539

3

tmp245-540 tmp245-541 tmp245-542 tmp245-543 tmp245-544

4

tmp245-545 tmp245-546 tmp245-547 tmp245-548 tmp245-549

5

tmp245-550 tmp245-551 tmp245-552 tmp245-553 tmp245-554

6

tmp245-555 tmp245-556 tmp245-557 tmp245-558 tmp245-559

7

tmp245-560 tmp245-561 tmp245-562 tmp245-563 tmp245-564

8

tmp245-565 tmp245-566 tmp245-567 tmp245-568 tmp245-569

9

tmp245-570 tmp245-571 tmp245-572 tmp245-573 tmp245-574

10

tmp245-575 tmp245-576 tmp245-577 tmp245-578 tmp245-579

11

tmp245-580 tmp245-581 tmp245-582 tmp245-583 tmp245-584

12

tmp245-585 tmp245-586 tmp245-587 tmp245-588 tmp245-589

13

tmp245-590 tmp245-591 tmp245-592 tmp245-593 tmp245-594

14

tmp245-595 tmp245-596 tmp245-597 tmp245-598 tmp245-599

15

tmp245-600 tmp245-601 tmp245-602 tmp245-603 tmp245-604

A trace of this algorithm for the example grammar using the sentence i* i + i and Table 7-13 is given in Table 7-16.

Next post:

Previous post: