Information Technology Reference
In-Depth Information
directories. In later sections, the system architec-
ture and the prefetching design will be discussed
among these nodes distributed in a high-speed
wide-area network environment.
running Windows 2003 Enterprise Edition operat-
ing system and IIS 6.0. We collect the disk access
trace using the DiskMon toolkit. Note that the web
server is providing contents for real users, thus
the disk I/O also comes from the real browsing
activities. After record 2,380,370 disk accesses in
50 hours, including all of the hits and misses in
the local cache, we can observe from the collected
traces that many of the disk accesses have specific
patterns, which results from the hyperlink relation-
ships and the fact that the users often have their
browsing habits. For a generic example, the access
on sector 76120651 has 1,295 occurrences in our
traces, and most of them are near to the access on
sectors 76120707 and 76120735. Although they
are not sequential numbers and there are often
several outlying accesses between them, we can
infer that 76120651…76120707…76120735 is
a pattern. In most of cases, the access on sector
76120651 indicates that the access on 76120707
and 76120735 will come soon.
Therefore, we can design a prefetching algo-
rithm based on pattern forecasting, which is ex-
ecuted by the memory nodes in RAM Grid. After
a number of accesses fall into the remote caching
provided by a memory node, it can forecast the
most probable disk blocks referred by sequential
accesses, and actively pushes these probable
disk blocks to the user node. Such a push-based
prefetching algorithm will make time overlapping
in network communication and boost the system
performance, as illustrated in Figure 1.
Compared with the traditional read ahead
mechanism, the advantages of the push-based
prefetching can be listed as follows. Firstly, the
user nodes in RAM Grid are usually burdened
with heavy workloads, while the memory nodes
often have extra CPU cycles. Thus the latter fit
for the forecast process of prefetching much bet-
ter than the former, and a consumptive but precise
prefetching algorithm can be employed. Sec-
ondly, besides the computational overhead, a
prefetching algorithm may have considerable
space consumption, and the memory nodes have
OVERVIEW
In traditional systems, an actual disk I/O op-
eration only occurs when it misses the local file
system cache in the operating system. Sarkar
et al. mentioned that the cache must be large
enough otherwise the costly disk accesses will
dominate the system performance (Sarkar, et al. ,
1996). The effect of RAM Grid, as well as other
remote memory sharing systems, is that it provides
abundant memory resources, which serves as an
intermediate cache hierarchy between the local
file system cache and local disk.
Another problem of the traditional file system
cache comes from the mechanism of read ahead.
The system often read several sequential blocks
when accessing just the first block of the sequence.
We can take the read ahead as a “blind” pull based
prefetching; the shortcoming of such prefetching
is two-fold. Firstly, the user node should decide
the number of blocks that it needs to read ahead,
which will unnecessarily take extra CPU cycles.
Secondly, read ahead on sequential blocks with-
out pattern analysis may have the risk of wasting
disk or network bandwidth and memory buffers
for the fact that not all of the applications will
access sequential blocks, which is usually called
“cache pollution”. In this paper, we propose a
push-based prefetching to solve the first problem,
and a “smart” prefetching based on the pattern
analysis instead of a “blind” one to address the
second problem.
In order to study the operations of traditional
file system cache, we collect disk access traces
from a very busy running web server with about 2
million page views per day. The server configura-
tion includes 2 Intel Pentium4 3.0GHz CPU with
2GB physical memory and 80GB SCSI hard disk,
Search WWH ::




Custom Search