PARALLEL COMPUTER ARCHITECTURES - Structured Computer Organization - page 567

Hardware Reference

In-Depth Information

An intermediate scheme is threshold sharing , in which a thread can acquire

resources dynamically (no fixed partitioning) but only up to some maximum. For

resources that are replicated, this approach allows flexibility without the danger

that one thread will starve due to its inability to acquire any of the resource. If, for

example, no thread can acquire more than 3/4 of the instruction queue, no matter

what the slow thread does, the fast thread will be able to run. The Core i7

hyperthreading uses different sharing strategies for different resources in an at-

tempt to address the various problems alluded to above. Duplication is used for re-

sources that each thread requires all the time, such as the program counter, register

map, and interrupt controller. Duplicating these resources increases the chip area

by only 5%, a modest price to pay for multithreading. Resources available in such

abundance that there is no danger of one thread capturing them all, such as cache

lines, are fully shared in a dynamic way. On the other hand, resources that control

the operation of the pipeline, such as the various queues within the pipeline, are

partitioned, giving each thread half of the slots. The main pipeline of the Sandy

Bridge microarchitecture used in the Core i7 is illustrated in Fig. 8-9, with the

white and gray boxes indicating how the resources are allocated between the white

and gray threads.

PC

I-cache and

micro-op cache

Fetch

queue

Allocate/

renaming

Reorder

buffer

Scheduler

Registers Execution

D-cache

Register

write

Retirement

queue

Figure 8-9. Resource sharing between threads in the Core i7 microarchitecture.

In this figure we can see that all the queues are partitioned, with half the slots

in each queue reserved for each thread. In this one, neither thread can choke off

the other. The register allocator and renamer is also partitioned. The scheduler is

dynamically shared, but with a threshold, to prevent either thread from claiming all

of the slots. The remaining pipeline stages are fully shared.

All is not sweetness and light with multithreading, however. There is also a

downside. While partitioning is cheap, dynamic sharing of any resource, especial-

ly with a limit on how much a thread can take, requires bookkeeping at run time to

monitor usage. In addition, situations can arise in which programs work much

worse with multithreading than without it. For example, imagine two threads that

each need 3/4 of the cache to function well. Run separately, each one works fine

and has few (expensive) cache misses. Run together, each one has numerous cache

misses and the net result may be far worse than without multithreading.

Next Page

Structured Computer Organization

Search WWH ::

Custom Search

Home