Information Technology Reference
In-Depth Information
Unlike the existing file-level prefetch policies,
DiskSeen directly accesses disk blocks via LBNs,
including both file content data blocks and meta-
data blocks, such as inode and indirect blocks. The
challenge is that, being prefetched at disk level,
these blocks' semantic information is unknown,
except their LBNs. In other words, we would not
know which file a block belongs to, or what type
a block is. Meanwhile, back-translating LBNs to
files/offset is cumbersome too. In order to make
the LBN-based prefetched blocks usable by high-
level I/O routines, we treat a disk partition as a
raw device file to read blocks and place them in
the prefetching area. Only when a high-level I/O
request is issued, we check the LBNs of requested
blocks against those of prefetched blocks resident
in the prefetching area. If a match is found, the
prefetched block is moved into the caching area
to satisfy the I/O request. This design significantly
simplifies the implementation complexity.
real-life applications. We briefly introduce the six
workloads as follows.
strided - a synthetic program reading a
1GB file in a strided fashion by reading
every other 4KB of data from the begin-
ning to the end of the file. There is a small
amount of compute time after each read.
reversed - a synthetic program sequential-
ly reading one 1GB file from its end to its
beginning.
CVS - a version control utility widely used
in software development environment. We
use command (cvs -q diff) to compare a
user working directory to the CVS reposi-
tory. Two identical set of data are stored on
disk with 50GB space in between.
diff - a Linux tool that compares files char-
acter by character. Similar to CVS, it ac-
cesses two data sets.
grep - a textual search tool that scans a col-
lection of files for lines containing a match
for a keyword in given expression.
performance eValuation
TPC-H - a widely used decision support
benchmark that handles business-oriented
queries against a database system. We use
PostgreSQL 7.3.18 as the database server,
and the data set is generated using scale fac-
tor 1. Query 4 is used in the experiments.
Our experimental system is a machine with
a 3.0GHz Intel Pentium 4 processor, 512MB
memory, and a Western Digital WD1600JB
160GB 7200RPM hard drive. The hard drive has
an 8MB cache. The OS is Redhat Linux WS4
with the Linux 2.6.11 kernel using the Ext3 file
system. For configuration in DiskSeen, T , the
access index gap threshold, is set as 2048, and S ,
which is used to determine the trail extent, is set
as 128. The other system configurations are set
using default values.
For analysis of experimental results across
different benchmarks, we use the source code
tree of Linux kernel 2.6.11 as the data set, whose
size is about 236MB, in benchmarks CVS , diff ,
and grep .
workloads
experimental results
In order to analyze the performance of DiskSeen
in different scenarios, we carefully select six
representative data intensive benchmarks with
different access patterns to measure their execution
times. The six benchmarks include two synthetic
workloads, strided and reversed , and another four
In order to examine the performance of sequence-
based prefetching and history-aware prefetching
in DiskSeen, we show the execution times of the
benchmarks on the stock Linux kernel, and the
times for their first and second runs on the kernel
with the DiskSeen scheme in Table 1. Note that,
Search WWH ::




Custom Search