Exploiting Disk Layout and Block Access History for I/O Prefetch - Advanced Operating Systems and Kernel Applications - page 214

Information Technology Reference

In-Depth Information

Table 1.Execution Times (seconds) of workloads on the stock Linux kernel and kernel with DiskSeen

Workload name

Linux 2.6.11

DiskSeeen - 1 st Run

DiskSeen - 2 nd Run

strided

33.99

33.69

24.93

reversed

99.98

100.26

49.17

CVS

81.57

68.22

55.50

diff

98.36

81.14

46.12

grep

17.24

14.02

13.83

TPC-H (Q4)

93.85

88.22

69.43

since there is no history information stored in the

block table, only sequence-based prefetching is

available for the first run with DiskSeen. Between

any two consecutive runs, the buffer cache is emp-

tied by un-mounting the file system to ensure all

blocks are accessed from disk in the next run.

their LBNs, such that these data blocks can be

prefetched in one disk rotation. This also explains

performance improvement observed for strided .

It is worth pointing out here that, reverse se-

quential and forward/backward strided accesses are

not rare in real-life systems, especially in high-per-

formance computing environment. For example,

both the GPFS file system from IBM (Schmuck

& Haskin, 2002) and the MPI-IO standards (MPI

forum, 1997) provide special mechanism for

identifying and handling these cases. In DiskSeen,

such access patterns can be well handled without

extra efforts to change file systems.

Strided and Reversed

As synthetic workloads, strided and reversed rep-

resent two different I/O access patterns to examine

the effectiveness of DiskSeen in extreme cases.

Obviously, with a non-sequential access pattern,

strided and reversed cannot benefit from sequence-

based prefetching either at the file level or at disk

level. As shown in Table 1, the execution times

of their first runs with DiskSeen are not reduced.

However, the execution times are not increased

either, which indicates that DiskSeen introduces

negligible overhead.

When the history information is available,

the history-aware prefetching is activated dur-

ing the second runs of the two benchmarks. As a

result, DiskSeen shows significant reductions of

execution times, in specific, 27% for stride and

51% for reversed . This is because history trails

lead us to identify the prefetchable blocks. In the

stock Linux kernel, reversed accesses can cause

a full disk rotation to service each request, and

disk scheduler has little chance to improve such

synchronous disk accesses. In contrast, DiskSeen

can identifies the prefetchable blocks and requests

a large amount of blocks in ascending order of

CVS and Diff

CVS and diff have a similar data access pattern. In

both workloads, two sets of the Linux source code

tree are compared file by file. Such a disk access

pattern represents a very inefficient pattern - a

long seek distance exists between two consecutive

disk accesses, thus each disk access would raise a

long disk head seek and rotation latency.

As shown in Table 1, DiskSeen significantly

improves the performance of both CVS and diff

on the first runs and further on the second runs.

This is because the Linux source code tree ac-

cessed in both workloads mostly consists of small

files, which are laid out on the disk sequentially.

However, the file-level prefetching in the stock

Linux kernel cannot detect sequential disk ac-

cesses across files and most of these files are of

small size, so prefetching is only occasionally

Next Page

Advanced Operating Systems and Kernel Applications

Search WWH ::

Custom Search

Home