Hardware Reference
In-Depth Information
the number of independent I/O pathways available from compute nodes to
the file system. Typically, this number is between 8 and 1024 depending on
problem size and the compute and file system resources involved.
ALE3D groups MPI tasks into N groups and each group is responsible for
creating one of the N files. At any one moment, only one MPI task from each
group has exclusive access to the file. Hence, I/O is serial within a group.
However, because one task in each group is writing to its group's own le,
simultaneously, I/O is parallel across groups. Within a group, access to the
group's le is handled in a round-robin fashion. The rst MPI task in the group
creates the file and then iterates over all domains it has. For each domain, it
creates a sub-directory within the file (e.g., a separate namespace for Silo
objects) and writes all the Silo objects (the main mesh domain, the material
composition of the domain, the mesh variables defined on the domain) to that
directory. It repeats this process for each domain. Then, the first MPI task
closes the Silo file and hands off exclusive access to the next task in the group.
That MPI task opens the file and iterates over all domains in the same way.
Exclusive access to the file is then handed off to the next task. This process,
shown in Figure 21.2, continues until all processors in the group have written
their domains to unique sub-directories in the file.
After all groups have finished writing their Silo files, a final step involves
creating a master Silo file which contains special Silo objects (called multi-
block objects) that point at all the pieces of mesh (domains) scattered about
in the N files.
Setting N to be equal to the number of MPI tasks, results in a file-per-
process configuration, which is typically not recommended for users. However,
some applications do indeed choose to run this way with good results. Alter-
natively, setting N equal to 1 results in effectively serializing the I/O and is
certainly not recommended. For large, parallel runs, there is a sweet spot in
the selection of N which results in peak I/O performance rates. If N is too
large, the I/O subsystem will likely be overwhelmed; setting it too small will
likely underutilize the system resources. This is illustrated in Figure 21.3 for
different numbers of files and MPI task counts.
21.3 MIF and SSF Scalable I/O Paradigms
This approach to using Silo for scalable, parallel I/O was originally de-
veloped in the late 1990s by Rob Neely, a lead software architect on ALE3D
at the time. This approach is sometimes called \Poor Man's Parallel I/O." It
and variations thereof have since been adopted and used productively through
several transitions in orders of magnitude of MPI task counts from hundreds
then to hundreds of thousands today.
 
Search WWH ::




Custom Search