In Depth Tutorials and Information

Bottom-Up Parsing (Compiler Writing) Part 14

COMPARISON OF PARSING METHODS

Throughout the last two topics we have presented several parsing methods Each method is usually associated with a class of grammars. All parsing methods.

That is, the time taken to parse a given input string is proportional to the number of tokens in that string. This section briefly compares most of the parsing techniques introduced earlier. The criteria of comparison include the following:

1. Generality

2. Ease of writing a grammar

3. Ease of debugging a grammar

4. Error detection

5. Space requirements

6. Time requirements

7. Ease of application of semantics

Each criterion will now be discussed in turn.

Generality. An .LR(l) parser generator is the most general type of parser. Figure 7-16 gives the inclusion tree for the classes of formal grammars (and associated parsers) discussed so far. Although LR( 1) grammars are shown as a subset of the LR(k) class, Knuth has shown that any LR(k) grammar can be written as an equivalent LR(l) grammar. There is no great difference between the classes of ZJ?(1) and LALR(l) grammars or SLR( 1) grammars. It has been shown (Thompson, 1977) that the LL{ 1) grammars are a subset of the LALR (I ) class, with the former being a significantly smaller class thin the latter.

Figure 7-16 Grammar class inclusion tree.

The simple precedence class of grammars is also a small subset of the SLR(l) class. Since operator precedence gramars can be ambiguous, this class does not form a subset of the LR classes of grammars.

Ease of writing. To get a given grammar into a simple precedence form can require significant changes to the original grammar. These changes can affect the structure of the original language constructs and the application of semantics. Furthermore, empty productions are not allowed. The rewriting of a given grammar to be LL( 1) is often awkward and difficult because of the problem of left recursion. The rewriting of a grammar to be SLR (I), LALR( 1), or L.R(1) is usually easier to achieve without substantially changing the original grammar. An LL( 1) grammar may often contain more productions than an LR(\) grammar for the same language.

Ease of debugging. Because of the complexity of the construction process to obtain an LR parser, it may be nontrivial (especially for a novice) to eliminate local ambiguities from the given grammar so that it becomes an LR grammar. It is somewhat easier to debug the precedence and LL( 1) grammars. The relations in these methods can, if necessary, be generated by hand.

Error detection. Operator precedence parsers can, in fact, accept erroneous strings. LR( 1) and LL(\) parsers possess the valid prefix property. Errors in both types of parsers can be detected at the earliest possible point without pushing the next character onto the stack. The action table contains useful information for producing suitable error messages and attempting error recovery.

Space requirements. All of the parsers can be represented by tables. The simple precedence and operator precedence parsers have tables whose size depends on the number of vocabulary (VT U VN) and terminal symbols, respectively. The size of LL( 1) parsing tables is generally more efficient than that of precedence parsers. LALR( 1) and SLR (I) parsers, however, are more space-efficient than the precedence parsers.

Typically, an LL( 1) grammar will contain more productions than an LALR(l) grammar for the same language. The elimination of left recursion in LL( 1) grammars frequently requires additional nonterminal symbols. These extra nonterminal symbols adversely affect the size of the LL( 1) parsing tables. From our experience, however, it appears that the space requirements of LL( 1) and LALR(\) parsers are comparable.

Several comparative studies of space requirements have been done. An early study by Lalonde (Lalonde, 1971) compares the table sizes (in bytes) of LALR( 1), MSP, and SP (Wirth and Weber, 1966) parsers for four programming languages.

Part of his results follow:

Grammar	Number of productions	MSP	SP	LALR
XPL	109	3274	*	1250
EULER	120	3922	4321	1606
EULER-11	100	3017	3204	1276
ALGOL 60	173	> 6800*	> 6100*	2821

Observe that the LALR space requirements are substantially less than those for SP and MSP parsers. Furthermore, as to be expected, the space requirements for both SP and MSP parsers are comparable.

Another study (Anderson et al., 1973) compared SLR( 1), SP, and MSP space requirements for the languages ALGOL W and XPL. The ALGOL W compiler, which was developed at Stanford, contains a simple precedence parser which completely eliminates (due to Susan Graham) single productions. Some of their space-requirements results are as follows:

	Number of
Grammar	productions	SLR(l)PC	SLR(1)C	SPC	MSP
XPL	108	1182	_	—	2962
ALGOL W	190	2434	5523	6730	—

SLR(l)PC and SLR(l)C denote the parsers with partial and complete single-production elimination, respectively. SPC denotes the simple precedence parser with complete single-production elimination. The MSP parser did not eliminate single productions.

Note that the space requirements for the SLR (I) parser with partial single-production elimination are substantially less than those of a simple precedence parser with full elimination. Observe, however, that the space requirements of the SLR(l) parser with full elimination are not as impressive. Recall that there are now techniques for eliminating single productions that do not increase the original size of the SLR( 1) tables.

Recall that a matrix form of the parsing tables is usually preferred from a fast-lookup standpoint. One approach that has been taken (Joliat, 1972) is to retain the matrix form. This form, however, is factored into several special-purpose matrices. Although this approach has somewhat increased the storage requirements of the parser very fast time performance has been observed. More will be said about the timing results of the approach shortly.

Time requirements. Comparing different parsing techniques with respect to timing requirements depends on many factors. Some of these factors depend on a particular parsing technique. Other factors depend on the particular hardware system (such as the size of main memory and disk-access speed) and software-system environment (e.g., the efficiency of an operating system).

A timing comparison for several programs has been reported by Lalonde for MSP and LALR parsers. The comparison is summarized in the following table:

Program	Lines	Size Tokens	Number of reductions	MSP	LALR
Compactify	77	439	1,262	0.84	0.52
LCS	3,322	17,369	58,707	28.86	15.88
XCOM	4,241	24,390	66,108	45.35	25.11
SIMPAC	5,577	24,990	92,245	46.11	25.38
DIAL	6,405	32,136	116,803	58.24	32.65
DOSYS	7,291	29,334	81,581	55.58	30.49

In this comparison the LALR parser significantly outperforms the MSP parser. Apparently, looking for a handle on top of a stack in the MSP parser by comparing a substring on top of the stack with a table of sorted right parts of productions takes longer than searching a list represented matrix in the LALR method.

Another study (Anderson et al., 1973) has compared the time requirements of an SLR( 1) parser and the simple precedence (Wirth and Weber) parser used in the ALGOL W compiler. The results obtained for five different programs are as follows:

Program			Parsing method
Program	SLR (I)	SP	SLR(1)PC	SLR(1)C	SPC
Program 1	0.43	0.41	0.35	0.28	0.27
Program 2	1.12	1.09	0.94	0.76	0.75
Program 3	1.64	1.59	1.39	1.09	1.10
Program 4	2.40	2.32	2.02	1.61	1.60
Program 5	4.07	3.92	3.42	2.76	2.75

SPC and SP denote a simple precedence parser with and without single-production elimination, respectively. SLR(1)C, SLR(1)PC, and SLR (I) denote an SLR( 1) parser with complete, partial, and no single-production elimination.

Note that the performances of the SLR (I) and simple precedence parsers with single-production elimination are very nearly the same. These results are not consistent with the study by Lalonde. The simple precedence parser in the ALGOL W compiler has used a hashing function to determine the number of symbols to be replaced on the parse stack while performing a reduction. This strategy has apparently nullified the advantage that an SLR( 1) parser has in knowing in advance what production is to be applied (i.e., without searching the stack). Recall the MSP parser must search a list of productions in order to determine which production applies at each reduction. Also note that single-production elimination has resulted in an increase in parser speed of 50 percent over parsers without elimination.

Anderson et ai’ achieved this speedup in parsing partly by taking advantage of special hardware instructions for performing list searching. This advantage would have been significantly reduced had the list-searching part of the parser been programmed in a higher-level language. Joliat (1973) achieved the same parser speed without taking advantage of specialized hardware features.

Semantic application. Semantic actions are easily incorporated in both LL( 1) and LR parsing action tables. More will be said about this topic in Chap. 11 when attributed grammar translation will be discussed. In bottom-up parsers we usually only associate semantic actions with reduction actions, which thereby results in a cleaner separation of syntax and semantics. Because LL( 1) and LR parsers allow empty productions, these null productions can be used in a beneficial manner both in applying semantic actions and in error detection and recovery. Recall that empty productions are not allowed in precedence grammars.

We have examined the performance and merits of several parsing techniques. In practice, however, the choice of parsing methods used by a compiler writer may well be dictated by what is available on the local computer system. Provided that a reliable parser generator is available and it does not use too many systems resources, the compiler writer may well be indifferent to the actual parsing method used. If, however, there is more than one parser generator available on a given system, factors such as class of grammars accepted, ease of debugging a grammar, space requirements, and parsing speed may become important issues. With the high costs of programmers, good error detection, reporting, and recovery techniques are becoming much more important than in the past. A compiler that possesses good qualities with respect to assisting a programmer to debug programs more quickly is definitely preferred. Several years ago, interest in having good error diagnostics as an aid to debugging programs was of concern primarily in academic institutions. Today, because of high programming costs, all institutions are concerned with increasing programmer productivity in any way possible.

A compiler writer who does not have access to a parser generator can with a relatively small effort write an LL( 1) parser. In fact, the LL( 1) parsing table for an average-sized grammar can be constructed by hand. A straightforward way of getting the job done is to use a recursive-descent parser.

Next post: Symbol-Table-Handling Techniques (Compiler Writing) Part 1

Previous post: Bottom-Up Parsing (Compiler Writing) Part 13

Bottom-Up Parsing (Compiler Writing) Part 14

COMPARISON OF PARSING METHODS

Related Links

:: Search WWH ::