Scheduling - Operating Systems: Principles and Practice

Information Technology Reference

In-Depth Information

7.2.1

Scheduling Sequential Applications on Multiproces-

sors

Consider a server handling a very large number of web requests. A common

software architecture for servers is to allocate a separate thread for each user

connection. Each thread consults a shared data structure to see which portions

of the requested data are cached, and fetches any missing elements from disk.

The thread then spools the result out across the network.

How should the operating system schedule these server threads? Each thread

is I/O-bound, repeatedly reading or writing data to disk and the network, and

therefore makes many small trips through the processor. Some requests may

require more computation; to keep average response time low, we will want to

favor short tasks.

A simple approach would be to use a centralized multi-level feedback queue,

with a lock to ensure only one processor at a time is reading or modifying the

data structure. Each idle processor takes the next task off the MFQ and runs it.

As the disk or network finishes requests, threads waiting on I/O are put back

on the MFQ and executed by the network processor that becomes idle.

There are several potential performance problems with this approach:

Contention for the MFQ lock. Depending on how much computation

each thread does before blocking on I/O, the centralized lock may become

a bottleneck, particularly as the number of processors increases.

Cache Coherence Overhead. Although only a modest number of in-

structions are needed for each visit to the MFQ, each processor will need

to fetch the current state of the MFQ from the cache of the previous

processor to hold the lock. On a single processor, the scheduling data

structure might be already loaded into the cache, but with a single copy

of the scheduling data structure on a multiprocessor, the data is unlikely

to be cached on the relevant processor. Fetching remote data can slow the

speed of the processor down by two to three orders of magnitude. Since

this occurs while holding the MFQ lock, that lock can become even more

of a bottleneck.

Limited Cache Reuse. Threads often run on a different processor each

time they are scheduled, so that they must refetch any data they may have

previously cached. Even on a uniprocessor, some of the thread's data will

have been displaced from the cache during the time it was blocked, but

on-chip caches are so large today that much of the thread's data may

remain cached.

For these reasons, commercial operating systems such as Linux use a per-

processor data structure: a separate copy of the multi-level feedback queue for

Search WWH ::

Custom Search

Home