Hardware Reference
In-Depth Information
The parallel operations of a register conflict check and the other ID-stage operations
are realized by comparing a register field candidate of the instruction before identi-
fying that the field is a real register field, and the compared result is judged to be
meaningful or not after the identification that requires the instruction format type
from instruction decoding logic. The parallel operations reduce the time of the ID
stage and enhance the operating frequency.
After the ID stage, the operation depends on the pipeline and is executed accord-
ing to the instruction information provided from the ID stage. The INT pipeline
executes the operation at the EX stage using an ALU, a shifter, and so on; forwards
the operation result to the WB stage at the MA stage; and writes back the result to the
register at the WB stage. The LS pipeline calculates the memory access address at the
EX stage, loads or stores a data of the calculated address in a data cache at the MA
stage, and writes back the loaded data and/or the calculated address to the register at
the WB stage if any. If a cache miss occurs, all the pipelines are stalled to wait an
external memory access. The FE pipeline operations are described later in detail.
SH-4 adopted the Harvard architecture, which required the simultaneous access
of translation look aside buffers (TLBs) of instruction and data, and a conventional
Harvard-architecture processor separated the TLBs symmetrically. However, the
SH-4 enhanced the efficiency of the TLBs by breaking the symmetry. The address of
the instruction fetch is localized, and a four-entry instruction TLB (ITLB) was
enough to suppress the TLB miss. On the contrary, the address of the data access is
not so localized and requires more entries. Therefore, a 64-entry unified TLB (UTLB)
was integrated and used for both a data access and an ITLB miss handling. The ITLB
miss handling is supported by hardware, and it takes short cycles if the ITLB-missed
entry is in the UTLB. If the UTLB miss occurs for either of the accesses, a TLB miss
exception occurs, and a proper software miss handling will be issued.
The caches of the SH-4 are also asymmetric to enhance the efficiency. Since a
code size of the SH-4 is smaller than that of a conventional processor, the size of the
instruction cache is half of the data cache. The cache sizes are 8 and 16 KB.
3.1.2.5
Zero-Cycle Data Transfer
Since the number of transfer instructions of an SH-4 program was more than that of
the other architecture, the transfer instructions were categorized to BO group. Then
the transfer instructions can be inserted to any unused issue slots. Further, a zero-
cycle transfer operation was implemented for the transfer instructions and contrib-
utes to reduce the hazard.
The result of the transfer instruction already exists at the beginning of the opera-
tion as an immediate value in an instruction code, a value in a source operand
resister, or a value on the fly in a pipeline, and it is provided to the pipeline at the ID
stage, and the value is just forwarded in the pipeline to the WB stage. Therefore, the
simultaneous operation of the instruction right after the transfer instruction at
another pipeline can use the result of the transfer instruction, if the result is properly
forwarded by source-operand forwarding network.
Search WWH ::




Custom Search