Sequential File Prefetching in Linux - Advanced Operating Systems and Kernel Applications

Information Technology Reference

In-Depth Information

is sequential readahead, which has long been

a standard practice among operating systems

(Feiertag and Organick, 1971; McKusick, Joy,

Leffler, and Fabry, 1984). There are also more

comprehensive works to mine correlations

between files (Kroeger and Long, 2001; Paris,

Amer, and Long, 2003; Whittle, Paris, Amer,

Long, and Burns, 2003) or data blocks (Li, Chen,

and Zhou, 2005).

On the other hand, informed prefetching works

with hints from individual applications about their

future I/O operations. The hints could either be

explicitly controlled by application (Cao, Felten,

Karlin, and Li, 1996; Patterson, Gibson, Ginting,

Stodolsky, and Zelenka, 1995; Patterson, Gibson,

and Satyanarayanan, 1993) or be automatically

generated (Brown, Mowry, and Krieger, 2001).

Caching is another ubiquitous performance

optimization technique. It is a common practice to

share prefetching memory with cache memory, this

opens the door to interactions between prefetching

and caching. Besides the basic defensive support

for readahead thrashing discussed in this chapter,

there are also comprehensive works dealing with

integrated prefetching and caching (Butt, Gniady,

and Hu, 2005; Cao, Felten, Karlin, and Li, 1995;

Cao, Felten, Karlin, and Li, 1996; Dini, Lettieri,

and Lopriore, 2006; Gill and Modha, 2005; Pat-

terson, Gibson, Ginting, Stodolsky, and Zelenka,

1995; Itshak and Wiseman, 2008) and schemes to

dynamically adapt the prefetching memory (Li and

Shen, 2005) or prefetch depth (Gill and Bathen,

2007; Li, Shen, and Papathanasiou, 2007; Liang,

Jiang, and Zhang, 2007).

Different storage devices, disk array configura-

tions and workloads have different optimal I/O

size. Some applications (e.g. FTP) are not sensi-

tive to I/O latency and therefore can safely use

a large readahead size; Some others (e.g. VOD)

may be sensitive to I/O latency and should use a

more conservative I/O size.

Besides the tradeoff between throughput and

latency, prefetching hit ratio is another common

design consideration. It is defined as: for all the

data pages faulted in by the prefetching algorithm,

how much of them are actually accessed before

being reclaimed. To keep prefetching hit ratio high,

adaptive readahead size shall be employed. This

is because even we are sure that an application

is doing sequential reads, we have no idea how

long the read sequence will last. For example, it

is possible that an application scans one file from

beginning to end, while only accesses the first two

pages in another file.

Fortunately, the common I/O behaviors are

somehow inferable. Firstly, the number of pages

to be read(not considering the end-of-file case) and

the number of pages have been accessed are nor-

mally positively correlated. Secondly, the larger

read size, the more possibility it will be repeated.

Because the large read size implies an optimiza-

tion made by the developer for an expected long

run of reads. Based on the above two empirical

rules, the possibility of the current access pattern

being repeated can be estimated, upon which an

adaptive prefetching size can be calculated.

The prefetching hit ratio has served as one

major goal in the design of prefetching algorithms.

A low hit ratio means waste of memory and disk

I/O resources. The cost was high and unacceptable

in a day when these resources are scare and pre-

cious. So traditional prefetching algorithms tend

to do prefetching for only strictly sequential reads.

They employ conservative policies on prefetching

size as well as sequential detection, seeking for a

high prefetching hit ratio. For example, Linux 2.6

has been using a conservative 128KB readahead

size since its infancy.

design tradeoffs of prefetching

The prefetching size can greatly impact I/O

performance and is considered as the main I/O

parameter. One must tradeoff between throughput

and latency on determining its value. The general

guideline is: prefetching size should be large

enough to deliver good I/O throughput and small

enough to prevent undesirable long I/O latencies.

Advanced Operating Systems and Kernel Applications

Search WWH ::

Custom Search

Home