Processor Cores - Heterogeneous Multicore Processor Technologies for Embedded Systems

Hardware Reference

In-Depth Information

The parallel operations of a register conflict check and the other ID-stage operations

are realized by comparing a register field candidate of the instruction before identi-

fying that the field is a real register field, and the compared result is judged to be

meaningful or not after the identification that requires the instruction format type

from instruction decoding logic. The parallel operations reduce the time of the ID

stage and enhance the operating frequency.

After the ID stage, the operation depends on the pipeline and is executed accord-

ing to the instruction information provided from the ID stage. The INT pipeline

executes the operation at the EX stage using an ALU, a shifter, and so on; forwards

the operation result to the WB stage at the MA stage; and writes back the result to the

register at the WB stage. The LS pipeline calculates the memory access address at the

EX stage, loads or stores a data of the calculated address in a data cache at the MA

stage, and writes back the loaded data and/or the calculated address to the register at

the WB stage if any. If a cache miss occurs, all the pipelines are stalled to wait an

external memory access. The FE pipeline operations are described later in detail.

SH-4 adopted the Harvard architecture, which required the simultaneous access

of translation look aside buffers (TLBs) of instruction and data, and a conventional

Harvard-architecture processor separated the TLBs symmetrically. However, the

SH-4 enhanced the efficiency of the TLBs by breaking the symmetry. The address of

the instruction fetch is localized, and a four-entry instruction TLB (ITLB) was

enough to suppress the TLB miss. On the contrary, the address of the data access is

not so localized and requires more entries. Therefore, a 64-entry unified TLB (UTLB)

was integrated and used for both a data access and an ITLB miss handling. The ITLB

miss handling is supported by hardware, and it takes short cycles if the ITLB-missed

entry is in the UTLB. If the UTLB miss occurs for either of the accesses, a TLB miss

exception occurs, and a proper software miss handling will be issued.

The caches of the SH-4 are also asymmetric to enhance the efficiency. Since a

code size of the SH-4 is smaller than that of a conventional processor, the size of the

instruction cache is half of the data cache. The cache sizes are 8 and 16 KB.

3.1.2.5

Zero-Cycle Data Transfer

Since the number of transfer instructions of an SH-4 program was more than that of

the other architecture, the transfer instructions were categorized to BO group. Then

the transfer instructions can be inserted to any unused issue slots. Further, a zero-

cycle transfer operation was implemented for the transfer instructions and contrib-

utes to reduce the hazard.

The result of the transfer instruction already exists at the beginning of the opera-

tion as an immediate value in an instruction code, a value in a source operand

resister, or a value on the fly in a pipeline, and it is provided to the pipeline at the ID

stage, and the value is just forwarded in the pipeline to the WB stage. Therefore, the

simultaneous operation of the instruction right after the transfer instruction at

another pipeline can use the result of the transfer instruction, if the result is properly

forwarded by source-operand forwarding network.

Heterogeneous Multicore Processor Technologies for Embedded Systems

Search WWH ::

Custom Search

Home