Sequential File Prefetching in Linux - Advanced Operating Systems and Kernel Applications

Information Technology Reference

In-Depth Information

Figure 5. For a typical readahead window, async_size equals to size, and PG_readahead lies in the

first page

be a great challenge to the readahead algorithm.

The interleaving of several sequential streams

may be perceived as random reads.

In theory, it is possible to track the state of

each and every stream. However the cost would

be unacceptable: since each read request may start

a new stream, we will end up remembering every

read in the case of random reads. ZFS takes this

approach but limits the number of streams it can

track to a small value. The implementation also

has to deal with the complexity to dynamically al-

locate, lookup, age and free the stream objects.

Linux takes a simple and stupid approach: to

statically allocate one readahead data structure

for each file descriptor. This is handy in serving

the common case of one stream per file descrip-

tor, however it leaves the support for multiple

interleaved streams an open question.

When there are multiple streams per file de-

scriptor, the streams will contend the single space

slot for read offset and readahead window, and

end up overwriting each other's state. This gives

rise to two problems. Firstly, given an unreliable

prev_offset, how do we know the current read

request is a sequential one and therefore should

be prefetched? Secondly, given an unreliable

readahead window, how do we know it is valid for

the current stream and if not, how can we recover

it? The first problem has in fact been addressed in

the previous subsection. It is also straightforward

to validate the ownership of a readahead window:

if it contains the current PG_readahead page, then

it is the readhead window for the current stream.

Otherwise we have to seek alternative ways for

finding out the lost information about readahead

window.

Fortunately, we found that the lost readahead

window information can be inferred from the page

cache. Figure 5 shows the typical status of a page

cache. The application hits the PG_readahead

page and is triggering an interleaved readahead.

To recover the readahead window information,

we scan forward in the page cache from the cur-

rent page to the first missing page. The number

of pages in between is exactly async_size. As we

know that async_size equals to size for all subse-

quent readaheads and at least size/2 for the initial

readahead, we can safely take async_size as size,

and calculate the next readahead size from it.

readahead Sequence

Assume the sequence of readahead for a sequen-

tial read stream to be A0, A1, A2, … We call

A0 the “initial readahead” and A1, A2, … the

“subsequent readaheads”. The initial readahead is

typically synchronous in that it contains the cur-

rently requested page and the reader have to stop

and wait for the readahead I/O. The subsequent

readaheads can normally be asynchronous in that

the readahead pages are not immediately needed

by the application.

A typical readahead sequence starts with a

small initial readahead size, which is inferred

Advanced Operating Systems and Kernel Applications

Search WWH ::

Custom Search

Home