Information Technology Reference
In-Depth Information
reclaimed anonymous pages in the hope that they
would be efficiently swapped-in in the same order.
However, the data access pattern in SMM foils
the system effort. The swap-in accesses of the
vector arrays recording the positions of elements
in a matrix turn into random accesses, while the
elements of matrix elements are still sequentially
accessed. This explains why DULO can signifi-
cantly reduce the execution times of the program
(by up to 38.6%). This is because DULO detects
the random pages in the vector array and caches
them with a higher priority. Because the matrix
is a sparse one, the vector array cannot obtain
sufficiently frequent reuses to allow the original
kernel to keep them from being paged out. In addi-
tion, the similar execution times between the two
kernels when there is enough memory (exceeding
424MB) to hold the working set shown in the figure
suggest that DULO's overhead is small.
There have been many other techniques to control
the data placement on disk (Arpaci-Dusseau et al.
2003; Black et al. 1991) or reorganize selected disk
blocks (Hsu el al. 2003), so that related objects
are clustered and the accesses to them become
more sequential. Traxtent-aware file system ex-
cludes track boundary block from being allocated
for better disk sequential access performance
(Schindler et al. 2002). The effort on improving
access sequentiality through statically arranging
data layout on the disk is effective only when the
actually accesses take place in the assumed order.
If not or the access order changes from time to
time, many random accesses can still occur.
As the techniques focusing only on the disk
alone cannot fully solve the issue, another com-
plimentary effort, represented by DULO, is to
expose the data layout information to the upper-
lever software such as the buffer cache manage-
ment module in the OS kernel, so that they can
leverage the information in their policies for a
higher I/O throughput. Besides DULO, DiskSeen
is another example of such effort (Ding et al.
2007). DiskSeen improves the effectiveness of
prefetching by using the disk layout knowledge
to find the on-disk data access sequences. In ad-
dition to the conventional file-level prefetching,
the disk-level prefetching provides substantially
higher I/O performance for many patterns of ac-
cesses, especially for access of a large number
of small files. It is noted that the two efforts are
complementary and synergistic.
While statically improvement data layout on
the disk provides the opportunity of long sequence
of data access, leveraging the layout information
in the upper-level software can maximize the
performance potential of sequential access and
minimize the performance penalty incurred by
access random data.
We believe that exposing more detailed
information on the storage system, such as the
configuration of disk array, the data layout on a
disk, and buffer cache size on the storage control-
ler, to the various software layers of the I/O stack,
reSearch on improVing and
expoSing on-diSK layout for
upper-leVel SoftwareS
We know that the disk head seek time far domi-
nates I/O data transfer time, and the efficiency of
accessing sequential data on the disk can be one
order of magnitude higher than that of accessing
of random data. As the hard disk has been and is
expected to continue to be the mainstream on-line
storage device in the foreseeable future, efforts
on making sure on-disk data are sequentially
accessed are critical to maintain a high I/O per-
formance. Exposing information from the lower
layers up for better utilization of hard disk is an
active research topic.
Most of the existing work focuses on using
disk-specific knowledge for improving data place-
ments on disk that facilitate the efficient servicing
of future requests. For example, Fast File System
(FFS) and its variants allocate related data and
metadata into the same cylinder group to minimize
seeks (Mckusick et al. 1994; Ganger et al. 1997).
Search WWH ::




Custom Search