Hardware Reference
In-Depth Information
scheme to the cache and recover checkpoints after a failure disables a small
portion of the system.
The SCR team uses a technique called remote direct memory access
(RDMA), which pulls data off the node without involving the processor in
data movement. Different nodes can be coordinated to schedule their writing
to the file system.
5.8 Conclusion
The scientific mission of LLNL requires unique and cutting-edge computer
systems. Deliverables require balanced I/O capability and an equally cutting-
edge combination of file system hardware and software. Meeting such require-
ments for BG/Q was a significant undertaking requiring a refactoring of file
system internals for scalability. The undertaking was successful, but not with-
out challenge. In particular, applications can make work easier by leveraging
the strengths of the underlying system and avoiding known system weaknesses.
The move from the present systems to Exascale systems will be even more
daunting. Continued scaling of the file system will be dicult in part due to the
progressing disparity between the aggregate CPU performance vs. bandwidth
advances on spinning storage devices. Particular attention will need to be
paid to the use of burst buffer (NVRAM) technologies to address bandwidth
challenges. Also fundamental APIs will need to be reconsidered/refactored
(POSIX).
Bibliography
[1] Advanced Simulation and Computing. http://asc.llnl.gov/ .
[2] Advanced Simulation and Computing :Sequoia. http://asc.llnl.gov/
computing_resources/sequoia/ .
[3] Blue Gene. http://en.wikipedia.org/wiki/Blue_Gene .
[4] Blue Gene. http://www-03.ibm.com/ibm/history/ibm100/us/en/
icons/bluegene/ .
[5] Community Lustre Roadmap. http://lustre.opensfs.org/
community-lustre-roadmap .
[6] General
Parallel
File
System. http://www.ibm.com/systems/
software/gpfs .
 
Search WWH ::




Custom Search