Hardware Reference
In-Depth Information
An intermediate scheme is threshold sharing , in which a thread can acquire
resources dynamically (no fixed partitioning) but only up to some maximum. For
resources that are replicated, this approach allows flexibility without the danger
that one thread will starve due to its inability to acquire any of the resource. If, for
example, no thread can acquire more than 3/4 of the instruction queue, no matter
what the slow thread does, the fast thread will be able to run. The Core i7
hyperthreading uses different sharing strategies for different resources in an at-
tempt to address the various problems alluded to above. Duplication is used for re-
sources that each thread requires all the time, such as the program counter, register
map, and interrupt controller. Duplicating these resources increases the chip area
by only 5%, a modest price to pay for multithreading. Resources available in such
abundance that there is no danger of one thread capturing them all, such as cache
lines, are fully shared in a dynamic way. On the other hand, resources that control
the operation of the pipeline, such as the various queues within the pipeline, are
partitioned, giving each thread half of the slots. The main pipeline of the Sandy
Bridge microarchitecture used in the Core i7 is illustrated in Fig. 8-9, with the
white and gray boxes indicating how the resources are allocated between the white
and gray threads.
PC
I-cache and
micro-op cache
Fetch
queue
Allocate/
renaming
Reorder
buffer
Scheduler
Registers Execution
D-cache
Register
write
Retirement
queue
Figure 8-9. Resource sharing between threads in the Core i7 microarchitecture.
In this figure we can see that all the queues are partitioned, with half the slots
in each queue reserved for each thread. In this one, neither thread can choke off
the other. The register allocator and renamer is also partitioned. The scheduler is
dynamically shared, but with a threshold, to prevent either thread from claiming all
of the slots. The remaining pipeline stages are fully shared.
All is not sweetness and light with multithreading, however. There is also a
downside. While partitioning is cheap, dynamic sharing of any resource, especial-
ly with a limit on how much a thread can take, requires bookkeeping at run time to
monitor usage. In addition, situations can arise in which programs work much
worse with multithreading than without it. For example, imagine two threads that
each need 3/4 of the cache to function well. Run separately, each one works fine
and has few (expensive) cache misses. Run together, each one has numerous cache
misses and the net result may be far worse than without multithreading.
 
 
Search WWH ::




Custom Search