Hardware Reference
In-Depth Information
3.1.2.3
Asymmetric Superscalar Architecture
The asymmetric superscalar architecture is sensitive to the instruction categorization,
because the same category instruction cannot be issued simultaneously. For example,
if we categorize all floating-point instructions in the same category, we can reduce
the number of floating-point register ports, but cannot issue both floating-point
instructions of arithmetic and load/store/transfer operations at a time. This degrades
the performance. Therefore, the categorization requires careful trade-off consider-
ation between performance and hardware cost.
First of all, both the integer and load/store instructions were used most frequently
and categorized to different groups of integer (INT) and load/store (LS), respec-
tively. This categorization required address calculation unit in addition to the con-
ventional arithmetic logical unit (ALU). Branch instructions are about one fifth of a
program on average. However, it was difficult to use the ALU or the address calcu-
lation unit to implement the early-stage branch, which calculated the branch
addresses at one stage earlier than the other type of operations. Therefore, the branch
instruction was categorized in another group of branch (BR) with a branch-address
calculation unit. As a result, the SH-4 had three calculation units, but the perfor-
mance enhancement compensated the additional hardware.
Even a RISC processor had a special instruction that could not fit to the super-
scalar issue. For example, some instruction changed a processor state and was
categorized to a group of nonsuperscalar (NS) because most of instructions could
not be issued with it.
The SH-4 would frequently use an instruction to transfer a literal or register value
to a register because of the 16-bit fixed-length ISA. Therefore, the transfer instruc-
tion was categorized to BO group to be executable on both integer and load/store
(INT and LS) pipelines, which were originally for the INT and LS groups. Then the
transfer instruction could be issued with no resource conflict. A usual program could
not utilize all the instruction issue slots of conventional RISC architecture that has
three operand instructions and uses transfer instructions less frequently. Extra trans-
fer instructions of the SH-4 could be inserted easily with no resource conflict to the
issue slots that would be empty for a conventional RISC.
As mentioned above, it increased a pipeline hazard to set a single group for all
the floating-point instructions. Therefore, the floating-point load/store/transfer and
arithmetic instructions were categorized to the LS group and a floating-point execu-
tion (FE) group, respectively. This categorization increased the number of the ports
of the floating-point register file. However, the performance enhancement deserved
the increase.
The floating-point transfer instructions were not categorized to the BO group.
This was because neither the INT nor FE group fits to the instruction. The INT
pipeline could not use the floating-point register file, and the FE pipeline was too
complicated to treat the simple transfer operation. Further, the transfer instruction
was often issued with an FE group instruction, and the categorization to other than
the FE group was enough condition for the performance.
Search WWH ::




Custom Search