Parallel Algorithms for Bioinformatics - Parallel Computing for Bioinformatics and Computational Biology

Biomedical Engineering Reference

In-Depth Information

Tagged

memory

Inter

connection

network

21-stage

pipeline

128 streams

Tagged

memory

21-stage

pipleine

128 streams

Tagged

memory

21-stage

pipeline

128 streams

Figure 21.3 A conceptual view of a multiprocessor MTA. Each processor has a 21-stage

pipeline, each stage of which can execute an instruction from a different thread. The states

of threads are stored in special hardware called “streams” that allow zero-overhead switching

between different threads. A thread that is blocked (e.g., waiting for a word from memory

or for a synchronization event) causes no overhead as the processor switches over to another

ready thread. To obtain high utilization, the pipeline must be kept as full as possible. This is

made easy by the comparatively large number of streams. The entire memory is uniformly

accessible from all processors and, other than hotspotting (discussed in Section 21.2.2.3), no

locality issues arise on this machine.

during the read operation. If the full / empty bit is not set when a readfe() is exe-

cuted, the corresponding thread suspends (with very low overhead, its state being

saved in the stream hardware) and is later retried. It resumes when the read operation

has completed. The writeef() writeef (“wait-until-empty then write-and-set-full”)

operation is the complement of readfe() . The readfe() , writeef() , and

similar operations that manipulate the full / empty tag bits are called intrinsics and

compile into individual machine instructions. Thus they allow very fine-grained

synchronization.

Search WWH ::

Custom Search

Home