That's your problem.
This is not necessarily obvious and is highly dependent upon the individual compiler. If you just
use the usual byte code compiler, there are no particular issues--your program will run at a
nominal speed on any platform. With a JIT compiler or an adaptive compiler such as HotSpot, the
compiler is able to take advantage of specific instructions on individual machines and you should
expect much better performance and you should not have to do anything extra to obtain it.
C Compiler Optimization
By contrast, let's consider what you need to do for optimal performance of a C program. You need
to select the individual machine to compile for. For example, Sun supports SS1s and SS2 (both
SPARC version 7 machines, which trap to the kernel to handle the integer multiply instruction),
SS10s, SS20, SS1000s, and SC2000s (all SPARC version 8 machines, which have hardware
integer multiply); and Ultras (SPARC version 9 machines, which have 64-bit registers and 64-bit
operations). Optimizing for an Ultra might produce lousy code for an SS1. Optimizing for an SS1
will produce OK code for an SS10 or Ultra. (This is a marketing decision, of course.)
You need to choose the optimization level for your program. You may choose different levels for
different modules! Sun compilers, for example, provide five levels of optimization. Level -xO2 is
the normal good optimization level, producing fairly tight code, highly reliable and highly correct.
Levels 3, 4, and 5 produce extremely fast code (it may be larger), which is much faster than -xO2
in some cases and possibly slower in others. They are much more likely to fail (i.e., not compile at
Thus, expect to compile and test your program at -xO2 (default). Compile and profile it at -xO2.
Separate out the high time functions and recompile them at higher levels. If they work and are
faster, great. If not, too bad.
Java Compiler Optimization
Java compilers do not in general have anything similar to the switches in C, and you often have no
options at all.
Buy Enough RAM
Test the program with different amounts of memory and select the best price/performance level.
Organize your data so that when you do read a disk block, you make maximum use of it and you
don't have to read it again. One obvious thing is to use the mmap() calls to map files into the
address space instead of calling read(). This eliminates an extra kernel memory copy and allows
you to give access pattern hints to the OS.
Again, Java does not have any such options. The only way to get data into the program is to call
read(). Unfortunately, system calls are particularly expensive in Java because Java must do a lot
of setup before calling the native code, so I/O in Java is significantly slower than even regular I/O
Minimize Cache Misses
Search WWH :