Information Technology Reference
In-Depth Information
workloads. But shared LLC increases hit time, because of much larger cache size
as compared to private LLC.
Replacement policy plays a major role in the performance improvement of
CMPs last level caches [1]. The most ecient replacement algorithm for cache
is to always discard the information that will not be needed for the longest time
in the future. This optimal result is referred to as Belady's optimal algorithm
[2]. But it is not implementable in practice because it is generally impossible to
predict how far in future the information will be needed. Recent studies found
that there is a huge performance gap between LRU and the Belady's optimal
algorithm and the miss rate using LRU policy can be up to 197% higher than
the optimal replacement algorithm [3] .
As mentioned in [4], the least-recently-used (LRU) policy has two major dis-
advantages: First, a cache block can remain long time in the cache even after
its last use. This is because it takes a long time to make the block as LRU
block. In this way the block is staying in the cache unnecessarily. Such types
of blocks are called dead blocks [4]. The time gap between the death time (the
time when the block is last used) and the replacement time of a block become
worse as associativity increases. The second problem is that there can be some
blocks in L2 which may never be reused in future; such blocks are immediately
dead after loading into L2 cache. This is because of the spatial bursty reuse of
different bytes of the same cache line which aggravates with increase in block
size. The problem with such bursty reuse is that it makes the block MRU (most
recently used) in L1 and accessed once in L2. Even the block is accessing rapidly
in L1, it accessed in L2 only once. So the principle of temporal locality holds
in L1 but not in L2. These blocks are called never-reused blocks [4]. Such dead
and never-reused blocks are an unnecessary burden of LRU policy and degrade
performance.
In case of highly associative LLC's, true LRU impose additional complexity
and area overheads. That is the reason that most of the current processors in
the market use pseudo-LRU replacement policy. Pseudo-LRU has lesser hard-
ware complexity as compared to LRU and performs almost same as LRU. There
are some other interesting proposals for cache replacement policies (discussed
in Section 2). But most of them have complex mechanism and hardware over-
head for maintaining aging bits, counters etc. LRU performs better in many
applications having higher temporal locality [1]. Hence, instead of completely
removing the concept of LRU, researchers proposed many innovative techniques
to improve the performance of LRU based policies [3, 5-7]. In these proposals
the concept of aging and temporal locality exists but improves the performance
of LRU.
In this paper we proposed a mechanism called random-LRU to improve the
performance of LRU in LLC with much lesser hardware cost than true LRU
as well as some other LRU based policies. The cache using local replacement
policies maintain separate replacement hardware for every set. In our proposed
policy we have not implemented LRU policy for the entire set. We divide the
ways of each set into two partitions: random partition ( RP ) and replacement
Search WWH ::




Custom Search