Information Technology Reference
In-Depth Information
1KByte
BANK N
BankN enable
...
wordlines
CAM
tag
data
array
BANK 1
Bank1 enable
BANK 0
wordlines
CAM
tag
sense amps
wordlines
Cache enable
Bank0 enable
dat
a
array
CAM
tag
align & output
CAMCLK0
data
array
EGCLK
sense amps
A & B
clock
sense amps
GCLKA
0
GCLKB
0
align & output
EGCLK
generation
align & deliver
tag
offset
FIGURE 4.6:
32-Bank CAM-tag cache in Xscale. Adapted from [
58
].
One of Xscale's distinguishing architectural features for low power is its CAM-tag
organization of its 32KB instruction and data caches. A CAM-tag cache organization (as
opposed to a RAM-tag organization) combines address decoding with tag comparison in one
step. It allows highly associative caches (e.g., 32-way in the Xscale) with very low miss rates
while, at the same time, being very power efficient at that performance level [
244
].
Figure 4.6 shows the organization of a 32KB cache in the Xscale. The cache comprises 32
independent banks of 1KB each. Each bank is composed of a CAM-tag array and a data array.
A tag match in the CAM drives the corresponding wordline of the data array. The cache is
extensively clock-gated: only one of the 32 banks (1KB) is enabled during an access. This limits
the CAM rows that are searched to the rows of a single 32-way set. Once the clock for the
CAM tag match of a bank is gated (the CAMCLK in Figure 4.6), no additional clocks can be
generated for that bank (i.e., clocks A and B in Figure 4.6) inhibiting any further activity. Clark
et al. emphasize that this extensive clock gating goes beyond any previous design in making
this a very power-efficient cache [
58
].
4.3 IDLE-WIDTH SWITCHING ACTIVITY: CORE
Idle-Width switching activity
is the excessive switching activity which arises from a mismatch
between the designed bit-width of a processor and the actual bit-width needed in frequently
Search WWH ::
Custom Search