Information Technology Reference
In-Depth Information
Memory Modules
000
001
010
011
100
101
110
111
1111
a
0000
d
d
r
Double-Buffered Registers
Figure 11.14 Eight interleaved memory modules with double buffering. Addresses are supplied
in parallel while data is pipelined into and out of the memory.
allows the register sets to be alternated so that data can be moved continuously between the
CPU and the modules. This allows the interleaved memory to be about 2 r times slower than
the CPU and yet, with a small set of fast registers, appear to be as fast as the CPU. This works
only if the program accessing memory does not branch to a new set of words. If it does, the
startup time to access a new word is about 2 r times the CPU speed. Thus, an interleaved
random-access memory also requires time of the form α + to access k words. For example,
for a moderately fast random-access chip technology α might be 80 nanoseconds whereas β
might be 10 nanoseconds, a ratio of 8 to 1.
This discussion justifies assuming that the time to move k words with consecutive addresses
to and from the l th unit in the memory hierarchy is α l + l for positive constants α l and
β l ,where α l is typically much larger than β l .If k = b l =
2 α l
and the time to retrieve one item and b l items is about the same. Thus, efficiency dictates that
items should be fetched in blocks, especially if all or most of the items in a block can be used if
one of them is used. This justifies the block-I/O model described below. Here we let t l be the
time to move a block at level l . We add the requirement that data stored together be retrieved
together to reflect physical constraints existing in practice.
α l l
,then α l + l
DEFINITION 11.6.1 (Block-I/O Model) At the l th level in a memory hierarchy, I/O operations
are performed on blocks. The block size and the time in seconds to access a block at the l th level are
b l and t l , respectively. For each l , b l /b l− 1 is an integer. In addition, any data written as part of a
block at level l must be read into level l
1 by reading the entire block in which it was stored.
The lower bounds on the number of I/O steps given in Section 11.5 can be generalized to
the block-I/O case by dividing the number of I/O operations by the size b l of blocks moving
between levels l
1and l . This lower bound can be achieved for matrix-vector and matrix-
matrix multiplication because data is always written to and read from the higher-level memory
in the same way for these problems. (See Problems 11.13 and 11.14 .)
 
Search WWH ::




Custom Search