Hardware Reference
In-Depth Information
File system-wise, there is a single GPFS file system across the entire stor-
age space, which is 20 PB. From lessons learned with Intrepid, ALCF kept
the eight-server resiliency groups on each DDN array, and they started out
with dedicated nodes to be GPFS managers. ALCF also decided to improve
metadata performance by allocating SSD LUNs to metadata only and SATA
LUNs to data only.
4.4 I/O Software
4.4.1 GPFS
ALCF deployed similar configurations for GPFS on their main systems,
Intrepid and Mira. On Intrepid, they ran the GPFS versions 3.2 and 3.3 for
the lifetime of the system. On Mira, they are running GPFS version 3.5. On
Intrepid, they only upgraded GPFS when they needed to fix a problem rather
than updating software proactively. This became a problem when issuing a file
system check and upgrading GPFS multiple times because there were several
fixes for the fsk feature. On Mira, ALCF plans to track GPFS software up-
dates more closely and always vet new updates on their test and development
system.
4.4.1.1
Conguration
For both Intrepid and Mira, ALCF configured the storage system into an
NSD cluster, which contains only file server and dedicated manager nodes;
and then several remote clusters, which are client clusters only. The core
configuration parameters of the file systems are the block size: 4 MiB on
Intrepid and 8 MiB on Mira. Both file systems use scatter block allocation.
On Intrepid, there is no replication for data or metadata, thereby making
data protection completely reliant on RAID hardware. On Mira, metadata is
replicated. From experiences on Intrepid, it was extremely disruptive for the
file system to be oine for any reason. Even if a single LUN was down (1 out
of a possible 1152) and there was no metadata replication, there is the risk
that key top-level directories will become unavailable and the majority of the
file system can no longer be navigated.
Another key lesson learned on Intrepid was poor performance related to
the compiler linking of executables. Intrepid mixed both data and metadata
on the same SATA storage. Later, the GPFS home file system was upgraded
on Intrepid to use FusionIO Flash storage PCIe cards for metadata, and this
greatly improved the user experience. On Mira, ALCF expanded on this per-
formance improvement and designed all DDN SFA systems to have SSD LUNs,
which are dedicated to metadata. This generally improves the user experience
for operations like ls and directory navigation.
 
Search WWH ::




Custom Search