Hardware Reference
In-Depth Information
Chapter 2
Heterogeneous Multicore Architecture
2.1
Architecture Model
In order to satisfy the high-performance and low-power requirements for advanced
embedded systems with greater flexibility, it is necessary to develop parallel pro-
cessing on chips by taking advantage of the advances being made in semiconductor
integration. Figure 2.1 illustrates the basic architecture of our heterogeneous multi-
core [ 1, 2 ]. Several low-power CPU cores and special purpose processor (SPP)
cores, such as a digital signal processor, a media processor, and a dynamically
reconfigurable processor, are embedded on a chip. In the figure, the number of CPU
cores is m . There are two types of SPP cores, SPP a and SPP b , on the chip. The values
n and k represent the respective number of SPP a and SPP b cores. Each processor
core includes a processing unit (PU), a local memory (LM), and a data transfer unit
(DTU) as the main elements. The PU executes various kinds of operations. For
example, in a CPU core, the PU includes arithmetic units, register files, a program
counter, control logic, etc., and executes machine instructions. With some SPP cores
like the dynamic reconfigurable processor, the PU executes a large quantity of data
in parallel using its array of arithmetic units. The LM is a small-size and low-latency
memory and is mainly accessed by the PU in the same core during the PU's execu-
tion. Some cores may have caches as well as an LM or may only have caches with-
out an LM. The LM is necessary to meet the real-time requirements of embedded
systems. The access time to a cache is non-deterministic because of cache misses.
On the other hand, the access to an LM is deterministic. By putting a program and
data in the LM, we can accurately estimate the execution cycles of a program that
has hard real-time requirements. A data transfer unit (DTU) is also embedded in the
core to achieve parallel execution of internal operation in the core and data transfer
operations between cores and memories. Each PU in a core processes the data on its
LM or its cache, and the DTU simultaneously executes memory-to-memory data
transfer between cores. The DTU is like a direct memory controller (DMAC) and
executes a command that transfers data between several kinds of memories, then
checks and waits for the end of the data transfer, etc. Some DTUs are capable of
Search WWH ::




Custom Search