Biomedical Engineering Reference
In-Depth Information
Tagged
memory
Inter
connection
network
21-stage
pipeline
128 streams
Tagged
memory
21-stage
pipleine
128 streams
Tagged
memory
21-stage
pipeline
128 streams
Figure 21.3 A conceptual view of a multiprocessor MTA. Each processor has a 21-stage
pipeline, each stage of which can execute an instruction from a different thread. The states
of threads are stored in special hardware called “streams” that allow zero-overhead switching
between different threads. A thread that is blocked (e.g., waiting for a word from memory
or for a synchronization event) causes no overhead as the processor switches over to another
ready thread. To obtain high utilization, the pipeline must be kept as full as possible. This is
made easy by the comparatively large number of streams. The entire memory is uniformly
accessible from all processors and, other than hotspotting (discussed in Section 21.2.2.3), no
locality issues arise on this machine.
during the read operation. If the full / empty bit is not set when a readfe() is exe-
cuted, the corresponding thread suspends (with very low overhead, its state being
saved in the stream hardware) and is later retried. It resumes when the read operation
has completed. The writeef() writeef (“wait-until-empty then write-and-set-full”)
operation is the complement of readfe() . The readfe() , writeef() , and
similar operations that manipulate the full / empty tag bits are called intrinsics and
compile into individual machine instructions. Thus they allow very fine-grained
synchronization.
 
Search WWH ::




Custom Search