PARALLEL COMPUTER ARCHITECTURES - Structured Computer Organization

Hardware Reference

In-Depth Information

system, a hyperthreaded Core i7 chip looks like a dual processor in which both

CPUs happen to share a common cache and main memory. The operating system

schedules the threads independently. If two applications are running at the same

time, the operating system can run each one at the same time. For example, if a

mail daemon is sending or receiving email in the background while a user is inter-

acting with some program in the foreground, the daemon and the user program can

be run in parallel, as though there were two CPUs available.

Application software that has been designed to run as multiple threads can use

both virtual CPUs. For example, video editing programs usually allow users to

specify certain filters to apply to each frame in some range. These filters can mod-

ify the brightness, contrast, color balance, or other properties of each frame. The

program can then assign one CPU to process the even-numbered frames and the

other to process the odd-numbered frames. The two can then run in parallel.

Since the two threads share all the hardware resources, a strategy is needed to

manage the sharing. Intel identified four useful strategies for resource sharing in

conjunction with hyperthreading: resource duplication, partitioned resources,

threshold sharing, and full sharing. We will now touch on each of these in turn.

To start with, some resources are duplicated just for threading. For example,

since each thread has its own flow of control, a second program counter had to be

added. The table that maps the architectural registers ( EAX , EBX , etc.) onto the

physical registers also had to be duplicated, as did the interrupt controller, since the

threads can be independently interrupted.

Next we have partitioned resource sharing , in which the hardware resources

are rigidly divided between the threads. For example, if the CPU has a queue be-

tween two functional pipeline stages, half the slots could be dedicated to thread 1

and the other half to thread 2. Partitioning is easy to accomplish, has no overhead,

and keeps the threads out of each other's hair. If all the resources are partitioned,

we effectively have two separate CPUs. On the down side, it can easily happen

that at some point one thread is not using some of its resources that the other one

wants but is forbidden from accessing. As a consequence, resources that could

have been used productively lie idle.

The opposite of partitioned sharing is full resource sharing . When this

scheme is used, either thread can acquire any resources it needs, first come, first

served. However, imagine a fast thread consisting primarily of additions and

subtractions and a slow thread consisting primarily of multiplications and divis-

ions. If instructions are fetched from memory faster than multiplications and divis-

ions can be carried out, the backlog of instructions fetched for the slow thread and

queued but not yet fed into the pipeline will grow in time.

Eventually, this backlog will occupy the entire instruction queue, bringing the

fast thread to a halt for lack of space in the instruction queue. Full resource shar-

ing solves the problem of a resource lying idle while another thread wants it, but

creates a new problem of one thread potentially hogging so many resources that it

slows the other one down or stops it altogether.

Structured Computer Organization

Search WWH ::

Custom Search

Home