Information Technology Reference
In-Depth Information
7.2.1
Scheduling Sequential Applications on Multiproces-
sors
Consider a server handling a very large number of web requests. A common
software architecture for servers is to allocate a separate thread for each user
connection. Each thread consults a shared data structure to see which portions
of the requested data are cached, and fetches any missing elements from disk.
The thread then spools the result out across the network.
How should the operating system schedule these server threads? Each thread
is I/O-bound, repeatedly reading or writing data to disk and the network, and
therefore makes many small trips through the processor. Some requests may
require more computation; to keep average response time low, we will want to
favor short tasks.
A simple approach would be to use a centralized multi-level feedback queue,
with a lock to ensure only one processor at a time is reading or modifying the
data structure. Each idle processor takes the next task off the MFQ and runs it.
As the disk or network finishes requests, threads waiting on I/O are put back
on the MFQ and executed by the network processor that becomes idle.
There are several potential performance problems with this approach:
Contention for the MFQ lock. Depending on how much computation
each thread does before blocking on I/O, the centralized lock may become
a bottleneck, particularly as the number of processors increases.
Cache Coherence Overhead. Although only a modest number of in-
structions are needed for each visit to the MFQ, each processor will need
to fetch the current state of the MFQ from the cache of the previous
processor to hold the lock. On a single processor, the scheduling data
structure might be already loaded into the cache, but with a single copy
of the scheduling data structure on a multiprocessor, the data is unlikely
to be cached on the relevant processor. Fetching remote data can slow the
speed of the processor down by two to three orders of magnitude. Since
this occurs while holding the MFQ lock, that lock can become even more
of a bottleneck.
Limited Cache Reuse. Threads often run on a different processor each
time they are scheduled, so that they must refetch any data they may have
previously cached. Even on a uniprocessor, some of the thread's data will
have been displaced from the cache during the time it was blocked, but
on-chip caches are so large today that much of the thread's data may
remain cached.
For these reasons, commercial operating systems such as Linux use a per-
processor data structure: a separate copy of the multi-level feedback queue for
Search WWH ::




Custom Search