Hardware Reference
In-Depth Information
first cycle, but for thread B we immediately hit a problem in the next cycle, so only
one instruction can be issued, and so on.
A1 B1 C1 A3 B2 C3 A5 B3 C5 A6 B5 C7
A2
A1 B1 C1 C3 A3 A5 B2 C5 A6
A8
B3 B5
C2 A4
C4
B4 C6
A7
B6 C8
A2
C2 C4
A4
C6
A7
B4 B6
Cycle
Cycle
(a)
(b)
A1 B1 C2 C4 A4 B2 C6 A7 B3 B5 B7 C7
A2
C1
C3 A3
A5
C5
A6
A8 B4
B6
B8 C8
Cycle
(c)
Figure 8-8. Multithreading with a dual-issue superscalar CPU. (a) Fine-grained
multithreading.
(b) Coarse-grained multithreading.
(c) Simultaneous mult-
ithreading.
In Fig. 8-8(b), we see how coarse-grained multithreading works with a dual-
issue CPU, but now with a static scheduler that does not introduce a dead cycle
after an instruction that stalls. Basically, the threads are run in turn, with the CPU
issuing two instructions per thread until it hits one that stalls, at which point it
switches to the next thread at the start of the next cycle.
With superscalar CPUs, a third possible way of doing multithreading is avail-
able, called simultaneous multithreading and illustrated in Fig. 8-8(c). This ap-
proach can be seen as a refinement to coarse-grained multithreading, in which a
single thread is allowed to issue two instructions per cycle as long as it can, but
when it stalls, instructions are immediately taken from the next thread in sequence
to keep the CPU fully occupied. Simultaneous multithreading can also help keep
all the functional units busy. When an instruction cannot be started because a func-
tional unit it needs is occupied, an instruction from a different thread can be chosen
instead. In this figure, we are assuming that B8 stalls in cycle 11, so C7 is started
in cycle 12.
For more information about multithreading, see Gebhart et al. (2011) and
Wing-kei et al. (2011).
Hyperthreading on the Core i7
Having looked at multithreading in the abstract, let us now consider a practical
example: the Core i7. In the early 2000s, processors such as the Pentium 4 were
not delivering the performance boosts that Intel needed to keep up sales. After the
Pentium 4 was already in production, the architects at Intel looked for various
ways to speed it up without changing the programmers' interface, something that
would never have been accepted. Five ways quickly popped up:
 
 
Search WWH ::




Custom Search