PARALLEL COMPUTER ARCHITECTURES - Structured Computer Organization

Hardware Reference

In-Depth Information

1. Increasing the clock speed.

2. Putting two CPUs on a chip.

3. Adding functional units.

4. Making the pipeline longer.

5. Using multithreading.

An obvious way to improve performance is to increase the clock speed without

changing anything else. Doing this is relatively straightforward and well under-

stood, so each new chip that comes out is generally slightly faster than its prede-

cessor. Unfortunately, a faster clock also has two main drawbacks that limit how

great an increase can be tolerated. First, a faster clock uses more energy, which is

a huge problem for notebook computers and other battery-powered devices. Sec-

ond, the extra energy input means the chip gets hotter and there is more heat to dis-

sipate.

Putting two CPUs on a chip is relatively straightforward, but it comes close to

doubling the chip area if each one has its own caches and thus reduces the number

of chips per wafer by a factor of two, which essentially doubles the unit manufac-

turing cost. If the two chips share a common cache as big as the original one, the

chip area is not doubled, but cache size per CPU is halved, cutting into per-

formance. Also, while high-end server applications can often fully utilize multiple

CPUs, not all desktop applications have enough inherent parallelism to warrant

two full CPUs.

Adding additional functional units is also fairly easy, but it is important to get

the balance right. Having 10 ALUs does little good if the chip is incapable of feed-

ing instructions into the pipeline fast enough to keep them all busy.

A longer pipeline with more stages, each doing a smaller piece of work in a

shorter time period increases performance but also increases the negative effects of

branch mispredictions, cache misses, interrupts, and other factors that interrupt

normal pipeline flow. Furthermore, to take full advantage of a longer pipeline, the

clock speed has to be increased, which means more energy is consumed and more

heat is produced.

Finally, multithreading can be added. Its value is in having a second thread

utilize hardware that would otherwise have lain fallow. After some experimenta-

tion, it became clear that a 5% increase in chip area for multithreading support

gave a 25% performance gain in many applications, making this a good choice.

Intel's first multithreaded CPU was the Xeon in 2002, but multithreading was later

added to the Pentium 4, starting with the 3.06-GHz version and continuing with

faster versions of the Pentium processor, including the Core i7. Intel calls the im-

plementation of multithreading used in its processors hyperthreading .

The basic idea is to allow two threads (or possibly processes, since the CPU

cannot tell what is a thread and what is a process) to run at once. To the operating

Search WWH ::

Custom Search

Home