Hardware Reference
In-Depth Information
24.5.1.1
Checkpointing Too Frequently
In the case where the user is writing a checkpoint file about every 90
minutes, all I/O activity is represented by the benchmark tests (as shown
in Figure 24.4). The write I/O is achieving a high rate, as indicated by the
periodic blue dots, however, the user is checkpointing very frequently and
spent 80% of time on I/O (Figure 24.3). Based on the mean time between
failure of the Hopper system, the user could be checkpointing once every 8 to
10 hours, reducing the amount of time spent in I/O.
FIGURE 24.3: Darshan statistics of a job that is checkpointing too frequently.
FIGURE 24.4: Activities on the Hopper XE-6 file system during a user run.
The user is frequently checkpointing, causing the I/O activity to be high.
 
Search WWH ::




Custom Search