Hardware Reference
In-Depth Information
Fairly soon, large chips will have tens of billions of transistors. Such chips are
far too large to design one gate and one wire at a time. The human effort required
would render the chips obsolete by the time they were finished. The only feasible
approach is to use cores (essentially libraries) containing fairly large subassemblies
and to place and interconnect them on the chip as needed. Designers then have to
determine which CPU core to use for the control processor and which special-pur-
pose processors to throw in to help it. Putting more of the burden on software run-
ning on the control processor makes the system slower but yields a smaller (and
cheaper) chip. Having multiple special-purpose processors for audio and video
processing takes up chip area, increasing the cost, but produces higher performance
at a lower clock rate, which means lower power consumption and less heat dissipa-
tion. Thus chip designers increasingly contend with these macroscopic trade-offs
rather than worrying about where to place each transistor.
Audiovisual applications are very data intensive. Huge amounts of data have
to be processed quickly, so typically 50% to 75% of the chip area is devoted to
memory in one form or another, and the amount is rising. The design issues here
are numerous. How many levels of cache should be used? Should the cache(s) be
split or unified? How big should each cache be? How fast should each be? Should
some actual memory go on the chip, too? Should it be SRAM or SDRAM? The
answers to each of these questions have major implications for the performance,
energy consumption, and heat dissipation of the chip.
Besides design of the processors and memory system, another issue of consid-
erable consequence is the communication system—how do all the cores communi-
cate with each other? For small systems, a single bus will usually do the trick, but
for larger ones it rapidly becomes a bottleneck. Often the problem can be solved
by going to multiple buses or possibly a ring from core to core. In the latter case,
arbitration is handled by passing a small packet called a token around the ring. To
transmit, a core must first capture the token. When it is done, it puts the token
back on the ring so it can continue circulating. This protocol prevents collisions on
the ring.
As an example of an on-chip interconnect, look at the IBM CoreConnect , il-
lustrated in Fig. 8-13. It is an architecture for connecting cores on a single-chip
heterogeneous multiprocessor, especially complete system-on-a-chip designs. In a
sense, CoreConnect is to one-chip multiprocessors what the PCI bus was to the
Pentium—the glue that holds all the parts together. (With modern Core i7 systems,
PCIe is the glue, but it is a point-to-point network without a shared bus like PCI.)
However, unlike the PCI bus, CoreConnect was designed without any requirements
to be backward compatible with legacy equipment or protocols and without the
constraints of board-level buses, such as limits on the number of pins the edge con-
nector can have.
CoreConnect consists of three buses. The processor bus is a high-speed, syn-
chronous, pipelined bus with 32, 64, or 128 data lines clocked at 66, 133, or 183
MHz. The maximum throughput is thus 23.4 Gbps (vs. 4.2 Gbps for the PCI bus).
 
Search WWH ::




Custom Search