Program Optimization - Crafting a Compiler

Java Reference

In-Depth Information

for i =

1 to N do

for j =

1 to N do

A [ i , j ]

←

0

for k =

1 to N do

7

A [ i , j ]

← A [ i , j ]

+ B [ i , k ]

× C [ k , j ]

8

Figure 14.3: Fusing the loop nests.

the benefits of WPO. Even if a method cannot be inlined, WPO can generate a

version of the invokedmethod that is customized to its calling context. In other

words, the trend toward reusable code can result in systems that are general

but not as e

cient as they could be. Optimizing compilers try to regain some

of the lost performance.

Target-Specific Optimization

Portability ranks high among the goals of today's programming languages.

Ideally, once a program is written in a high-level language, it should be

movable without modification to any computing system that supports the

language. Architected interpretive languages such as the Java Virtual Ma-

chine (JVM) support portability nicely. Any computer that hosts a JVM in-

terpreter can run any Java program. But the question remains—how fast?

Although most modern computers are based on RISC principles, the details of

their instruction sets vary widely. Moreover, various models for a given archi-

tecture can also di

er with regard to their storage hierarchies, their instruction

timings, and their degree of concurrency.

ff

Continuing with our example, the program shown in Figure 14.2 is an

improvement over the version shown in Figure 14.1, but it is possible to obtain

still better performance. Consider the behavior of thematrix Result .Thevalues

of Result are computed in the loop nest at Markers 3 and 4 —one element

at a time. Each of these elements is then copied from Result to A by the loop

nest at Marker 6 . Poor performance can be expected on any non-uniform

memory access (NUMA) system that cannot accommodate Result at its fastest

storage level. Better performance can be obtained if the data is accumulated

in a register and then stored directly in A . Optimizing compilers that feature

loop fusion can identify that the outer two loop nests at Markers 3 and 6

are structurally equivalent. Dependence analysis can show that each element

of Result is computed independently. The loops can then be fused to obtain

the program shown in Figure 14.3 in which the Result matrix is eliminated.

Crafting a Compiler

Search WWH ::

Custom Search

Home