Information Technology Reference
In-Depth Information
Table 1.Execution Times (seconds) of workloads on the stock Linux kernel and kernel with DiskSeen
Workload name
Linux 2.6.11
DiskSeeen - 1 st Run
DiskSeen - 2 nd Run
strided
33.99
33.69
24.93
reversed
99.98
100.26
49.17
CVS
81.57
68.22
55.50
diff
98.36
81.14
46.12
grep
17.24
14.02
13.83
TPC-H (Q4)
93.85
88.22
69.43
since there is no history information stored in the
block table, only sequence-based prefetching is
available for the first run with DiskSeen. Between
any two consecutive runs, the buffer cache is emp-
tied by un-mounting the file system to ensure all
blocks are accessed from disk in the next run.
their LBNs, such that these data blocks can be
prefetched in one disk rotation. This also explains
performance improvement observed for strided .
It is worth pointing out here that, reverse se-
quential and forward/backward strided accesses are
not rare in real-life systems, especially in high-per-
formance computing environment. For example,
both the GPFS file system from IBM (Schmuck
& Haskin, 2002) and the MPI-IO standards (MPI
forum, 1997) provide special mechanism for
identifying and handling these cases. In DiskSeen,
such access patterns can be well handled without
extra efforts to change file systems.
Strided and Reversed
As synthetic workloads, strided and reversed rep-
resent two different I/O access patterns to examine
the effectiveness of DiskSeen in extreme cases.
Obviously, with a non-sequential access pattern,
strided and reversed cannot benefit from sequence-
based prefetching either at the file level or at disk
level. As shown in Table 1, the execution times
of their first runs with DiskSeen are not reduced.
However, the execution times are not increased
either, which indicates that DiskSeen introduces
negligible overhead.
When the history information is available,
the history-aware prefetching is activated dur-
ing the second runs of the two benchmarks. As a
result, DiskSeen shows significant reductions of
execution times, in specific, 27% for stride and
51% for reversed . This is because history trails
lead us to identify the prefetchable blocks. In the
stock Linux kernel, reversed accesses can cause
a full disk rotation to service each request, and
disk scheduler has little chance to improve such
synchronous disk accesses. In contrast, DiskSeen
can identifies the prefetchable blocks and requests
a large amount of blocks in ascending order of
CVS and Diff
CVS and diff have a similar data access pattern. In
both workloads, two sets of the Linux source code
tree are compared file by file. Such a disk access
pattern represents a very inefficient pattern - a
long seek distance exists between two consecutive
disk accesses, thus each disk access would raise a
long disk head seek and rotation latency.
As shown in Table 1, DiskSeen significantly
improves the performance of both CVS and diff
on the first runs and further on the second runs.
This is because the Linux source code tree ac-
cessed in both workloads mostly consists of small
files, which are laid out on the disk sequentially.
However, the file-level prefetching in the stock
Linux kernel cannot detect sequential disk ac-
cesses across files and most of these files are of
small size, so prefetching is only occasionally
 
Search WWH ::




Custom Search