Memory Hierarchy Design - Computer Architecture: A Quantitative Approach

Hardware Reference

In-Depth Information

Case Studies and Exercises by Norman P. Jouppi,

Naveen Muralimanohar, and Sheng Li

Case Study 1: Optimizing Cache Performance Via

Advanced Techniques

Concepts illustrated by this case study

■ Non-blocking Caches

■ Compiler Optimizations for Caches

■ Software and Hardware Prefetching

■ Calculating Impact of Cache Performance on More Complex Processors

The transpose of a matrix interchanges its rows and columns; this is illustrated below:

Here is a simple C loop to show the transpose:

for (i = 0; i < 3; i++) {

for (j = 0; j < 3; j++) {

output[j][i] = input[i][j];

}

Assume that both the input and output matrices are stored in the row major order ( row major

order means that the row index changes fastest). Assume that you are executing a 256 × 256

double-precision transpose on a processor with a 16 KB fully associative (don't worry about

cache conflicts) least recently used (LRU) replacement L1 data cache with 64 byte blocks. As-

sume that the L1 cache misses or prefetches require 16 cycles and always hit in the L2 cache,

and that the L2 cache can process a request every two processor cycles. Assume that each it-

eration of the inner loop above requires four cycles if the data are present in the L1 cache. As-

sume that the cache has a write-allocate fetch-on-write policy for write misses. Unrealistically,

assume that writing back dirty cache blocks requires 0 cycles.

2.1 [10/15/15/12/20] <2.2> For the simple implementation given above, this execution order

would be nonideal for the input matrix; however, applying a loop interchange optimiza-

tion would create a nonideal order for the output matrix. Because loop interchange is not

suicient to improve its performance, it must be blocked instead.

a. [10] <2.2> What should be the minimum size of the cache to take advantage of blocked

execution?

b. [15] <2.2> How do the relative number of misses in the blocked and unblocked ver-

sions compare in the minimum sized cache above?

Search WWH ::

Custom Search

Home