Multi/Many Core - High Performance Parallel I/O

Hardware Reference

In-Depth Information

to allow for a higher degree of isolation between independent streams of I/O

requests at all layers. For instance, I/O requests originating from different

contexts should be kept separate while traversing the I/O hierarchy, to avoid

generating mixed I/O patterns.

32.4 Challenges and Solutions

This section discusses an initial approach to improve the current I/O path

with respect to each problem listed in Section 32.3.

32.4.1 NUMA Eects

At present, a typical server implements a non-uniform memory architecture

with at least two NUMA nodes. The results of running a flexible I/O (FIO)

tester [3] on a standard 8-core Intel machine with two sockets and relatively

uniform memory accesses is shown in Figure 32.1. The workload used consists

of 4-KB random read requests with one to eight application threads. The

two curves correspond to two runs: one that takes into account the anity

between threads and the memory they allocate, and a base execution that does

not. After four application threads, where both system sockets are occupied

with threads, the performance of the base execution deteriorates up to 21%

compared to the one that takes anity into account.

Figures 32.2 and 32.3 display results for both FIO and interleaved-or-

random benchmarking (IOR) [8], an MPI-based benchmark that emulates

checkpoint patterns on a 64-core AMD server with eight NUMA nodes or-

ganized in four sockets. This system exhibits pronounced non-uniformity in

FIGURE 32.1: FIO read IOPS (4-KB random requests) of native (XFS file

system) with and without proper placement. Performance of the base execu-

tion deteriorates by at most 21%.

Search WWH ::

Custom Search

Home