Processor Cores - Heterogeneous Multicore Processor Technologies for Embedded Systems

Hardware Reference

In-Depth Information

3.1.2.3

Asymmetric Superscalar Architecture

The asymmetric superscalar architecture is sensitive to the instruction categorization,

because the same category instruction cannot be issued simultaneously. For example,

if we categorize all floating-point instructions in the same category, we can reduce

the number of floating-point register ports, but cannot issue both floating-point

instructions of arithmetic and load/store/transfer operations at a time. This degrades

the performance. Therefore, the categorization requires careful trade-off consider-

ation between performance and hardware cost.

First of all, both the integer and load/store instructions were used most frequently

and categorized to different groups of integer (INT) and load/store (LS), respec-

tively. This categorization required address calculation unit in addition to the con-

ventional arithmetic logical unit (ALU). Branch instructions are about one fifth of a

program on average. However, it was difficult to use the ALU or the address calcu-

lation unit to implement the early-stage branch, which calculated the branch

addresses at one stage earlier than the other type of operations. Therefore, the branch

instruction was categorized in another group of branch (BR) with a branch-address

calculation unit. As a result, the SH-4 had three calculation units, but the perfor-

mance enhancement compensated the additional hardware.

Even a RISC processor had a special instruction that could not fit to the super-

scalar issue. For example, some instruction changed a processor state and was

categorized to a group of nonsuperscalar (NS) because most of instructions could

not be issued with it.

The SH-4 would frequently use an instruction to transfer a literal or register value

to a register because of the 16-bit fixed-length ISA. Therefore, the transfer instruc-

tion was categorized to BO group to be executable on both integer and load/store

(INT and LS) pipelines, which were originally for the INT and LS groups. Then the

transfer instruction could be issued with no resource conflict. A usual program could

not utilize all the instruction issue slots of conventional RISC architecture that has

three operand instructions and uses transfer instructions less frequently. Extra trans-

fer instructions of the SH-4 could be inserted easily with no resource conflict to the

issue slots that would be empty for a conventional RISC.

As mentioned above, it increased a pipeline hazard to set a single group for all

the floating-point instructions. Therefore, the floating-point load/store/transfer and

arithmetic instructions were categorized to the LS group and a floating-point execu-

tion (FE) group, respectively. This categorization increased the number of the ports

of the floating-point register file. However, the performance enhancement deserved

the increase.

The floating-point transfer instructions were not categorized to the BO group.

This was because neither the INT nor FE group fits to the instruction. The INT

pipeline could not use the floating-point register file, and the FE pipeline was too

complicated to treat the simple transfer operation. Further, the transfer instruction

was often issued with an FE group instruction, and the categorization to other than

the FE group was enough condition for the performance.

Heterogeneous Multicore Processor Technologies for Embedded Systems

Search WWH ::

Custom Search

Home