Information Technology Reference
In-Depth Information
1KByte
BANK N
BankN enable
...
wordlines
CAM
tag
data
array
BANK 1
Bank1 enable
BANK 0
wordlines
CAM
tag
sense amps
wordlines
Cache enable
Bank0 enable
dat a
array
CAM
tag
align & output
CAMCLK0
data
array
EGCLK
sense amps
A & B
clock
sense amps
GCLKA 0
GCLKB 0
align & output
EGCLK
generation
align & deliver
tag
offset
FIGURE 4.6: 32-Bank CAM-tag cache in Xscale. Adapted from [ 58 ].
One of Xscale's distinguishing architectural features for low power is its CAM-tag
organization of its 32KB instruction and data caches. A CAM-tag cache organization (as
opposed to a RAM-tag organization) combines address decoding with tag comparison in one
step. It allows highly associative caches (e.g., 32-way in the Xscale) with very low miss rates
while, at the same time, being very power efficient at that performance level [ 244 ].
Figure 4.6 shows the organization of a 32KB cache in the Xscale. The cache comprises 32
independent banks of 1KB each. Each bank is composed of a CAM-tag array and a data array.
A tag match in the CAM drives the corresponding wordline of the data array. The cache is
extensively clock-gated: only one of the 32 banks (1KB) is enabled during an access. This limits
the CAM rows that are searched to the rows of a single 32-way set. Once the clock for the
CAM tag match of a bank is gated (the CAMCLK in Figure 4.6), no additional clocks can be
generated for that bank (i.e., clocks A and B in Figure 4.6) inhibiting any further activity. Clark
et al. emphasize that this extensive clock gating goes beyond any previous design in making
this a very power-efficient cache [ 58 ].
4.3 IDLE-WIDTH SWITCHING ACTIVITY: CORE
Idle-Width switching activity is the excessive switching activity which arises from a mismatch
between the designed bit-width of a processor and the actual bit-width needed in frequently
 
Search WWH ::




Custom Search