Hardware Reference
In-Depth Information
A Multiprogramming And OS Workload
Our next study is a multiprogrammed workload consisting of both user activity and OS activ-
ity. The workload used is two independent copies of the compile phases of the Andrew bench-
mark, a benchmark that emulates a software development environment. The compile phase
consists of a parallel version of the Unix “make” command executed using eight processors.
The workload runs for 5.24 seconds on eight processors, creating 203 processes and perform-
ing 787 disk requests on three different file systems. The workload is run with 128 MB of
memory, and no paging activity takes place.
The workload has three distinct phases: compiling the benchmarks, which involves substan-
tial compute activity; installing the object files in a library; and removing the object files. The
last phase is completely dominated by I/O, and only two processes are active (one for each of
the runs). In the middle phase, I/O also plays a major role, and the processor is largely idle.
The overall workload is much more system and I/O intensive than the highly tuned commer-
cial workload.
For the workload measurements, we assume the following memory and I/O systems:
Level 1 instruction cache —32 KB, two-way set associative with a 64-byte block, 1 clock cycle
hit time.
Level 1 data cache —32 KB, two-way set associative with a 32-byte block, 1 clock cycle hit
time. We vary the L1 data cache to examine its effect on cache behavior.
Level 2 cache —1 MB unified, two-way set associative with a 128-byte block, 10 clock cycle
hit time.
Main memory —Single memory on a bus with an access time of 100 clock cycles.
Disk system —Fixed-access latency of 3 ms (less than normal to reduce idle time)
Figure 5.16 shows how the execution time breaks down for the eight processors using the
parameters just listed. Execution time is broken down into four components:
1. Idle —Execution in the kernel mode idle loop
2. User —Execution in user code
3. Synchronization —Execution or waiting for synchronization variables
4. Kernel —Execution in the OS that is neither idle nor in synchronization access
FIGURE 5.16 The distribution of execution time in the multiprogrammed parallel
“make” workload . The high fraction of idle time is due to disk latency when only one of the
eight processors is active. These data and the subsequent measurements for this workload
were collected with the SimOS system [Rosenblum et al. 1995]. The actual runs and data col-
lection were done by M. Rosenblum, S. Herrod, and E. Bugnion of Stanford University.
This multiprogramming workload has a significant instruction cache performance loss, at
least for the OS. The instruction cache miss rate in the OS for a 64-byte block size, two-way
set associative cache varies from 1.7% for a 32 KB cache to 0.2% for a 256 KB cache. User-level
 
Search WWH ::




Custom Search