Hardware Reference
In-Depth Information
contains the size of the record and the basic metadata to describe the
variable. The metadata include the name of the variable, the name of
the group the variable is associated with, the data type of the variable,
and a series of characteristic features. More details can be found in the
ADIOS manual [5].
22.2.2 Staged Write Method
The idea behind the staged write method is to first, perform bulk writes
as large as we can, using aggressive buffering and aggregation strategy. This is
done by aggregating data from two levels: within a single processor and among
a subset of processors (i.e., group). Second, further eliminate contentions in
the parallel file system and MPI collective communication. The aggregated
data will be written out in multiple subfiles, with each group outputting one
file. Each subfile is striped on a single storage target and therefore writes can
be done independently from other groups. The design specifics are highlighted
next.
22.2.3 Group-Based Hierarchical I/O Control
The group-based hierarchical I/O control scheme, see Figure 22.1, divides
all MPI processes into sub-groups based upon the individual MPI rank, to
avoid the overhead of doing MPI collective among all processors. Suppose an
MPI process can be denoted as P i where 0 i < N and N is the total number
of MPI processors. The number of subfiles that we are going to generate is M
and, without losing generality, we also assume N is a multiple of M. There-
fore, the number of processes that each group contains is N=M and hence,
for group Gi, i , the processes that it contains is [P N=Mi ;P N=M(i+1)1 ]. Each
group is assigned a group coordinator (i.e., aggregator), which is responsible
for managing data sent from all the members in the group Gi i and writing them
to disk in group. It also serves to collect an index from its members during
index generation. Note that only the aggregator processor in each group issues
writes to the file system, thereby significantly reducing the write contentions.
Similarly, to generate a metadata file which describes the entire datasets that
are scattered across all subfiles, a global coordinator will be chosen, e.g., P 0 .
Note that more complex leader election algorithms can be adopted here to
balance load across all processes and is left for future study. All processes
within Gi i write to a single subfile, thereby completely avoiding interference
from other groups. Overall group-based I/O control is designed to eliminate
bottlenecks and therefore achieve scalability. The number of groups is speci-
fied as an external parameter in ADIOS XML file. By and large, the selection
of the number is a tradeoff between (1) aggregation (subject to memory con-
straints) and (2) utilization of as many storage targets as possible to improve
concurrency.
 
Search WWH ::




Custom Search