Hardware Reference
In-Depth Information
20
asynchronous read, aggregation ratio 4:1
asynchronous read, aggregation ratio 1:1
synchronous read
15
10
5
0
1
10
100
1000
10000
Processors
FIGURE 22.4: S3D read performance scaling. Asynchronous read demon-
strates much better performance than synchronous reads as application scales
up. Higher aggregation ratio also demonstrates better performance.
processor. Next, the aggregator processor parses the received message and
then sorts all requests by their address (i.e., file idx and offset). ). file idx is
the subfile ID and offset is the file offset within a subfile. If a set of requests
fall into a window of certain size (e.g., 64 MB), a single read request that
accommodates the entire request set will be issued to the file system. The
aggregator subsequently extracts the portion that was requested and sends it
back to its member via MPI messaging. The staged read performance is shown
in Figure 22.4. We can see the staged read (i.e., asynchronous read) outper-
form synchronous read substantially, about 3{5 times at 96,000-core runs. At
low core counts, the three techniques achieve largely the same performance
due to the relatively high cost of staged read.
22.2.9 Limitations
Staged read is essentially a type of \delayed read" technique, in which
memory buffers that are passed into the adios read var() call becomes valid
only after file close, i.e., the adios fclose() call. This will restrict the technique
from being used in certain applications where a few variables must first be read
in before others, for example, sizing/dimension data. Luckily, in ADIOS-BP,
we store the scalar values directly in metadata and their values are available
immediately after file open, thereby avoiding the problem to some degree.
Another disadvantage is that chunk reading might increase the overhead in
 
Search WWH ::




Custom Search