Database Reference
In-Depth Information
between GPFS-FPO and HDFS is that GPFS-FPO is a kernel-level file system,
whereas HDFS runs on top of the operating system . Many limitations in HDFS stem
from the fact that it's not fully POSIX-compliant. On the other hand, GPFS-FPO
is 100 percent POSIX-compliant, which can give you significant flexibility.
Because GPFS is POSIX-compliant, files that are stored in GPFS-FPO are
visible to all applications, just like any other files stored on a computer. For
example, when copying files, any authorized user can use traditional operat-
ing system commands to list, copy, and move files in GPFS-FPO. This isn't
the case in HDFS, where users need to log into Hadoop to see the files in the
cluster. As for replication or backups, the only mechanism available for
HDFS is to copy files manually through the Hadoop command shell.
The full POSIX compliance of GPFS-FPO enables you to manage your
Hadoop storage just as you would any other computer in your IT environ-
ment. That's going to give you economies of scale when it comes to building
Hadoop skills, and just make life easier. For example, your traditional file
administration utilities will work, as will your backup and restore tooling
and procedures. GPFS-FPO will actually extend your backup capabilities
because it includes point-in-time (PiT) snapshot backup, off-site replication,
and other utilities.
With GPFS-FPO, you can safely manage multitenant Hadoop clusters with
a robust separation of concern infrastructure for your Hadoop cluster, allowing
other applications to share the cluster resources. This isn't possible in HDFS.
This also helps from a capacity planning perspective, because without
GPFS-FPO, you would need to design the disk space that is dedicated to the
Hadoop cluster up front. If fact, not only do you have to estimate how much
data you need to store in HDFS, but you're also going to have to guess how
much storage you'll need for the output of MapReduce jobs, which can vary
widely by workload. Finally, don't forget that you need to account for space
that will be taken up by log files created by the Hadoop system too! With
GPFS-FPO, you only need to worry about the disks themselves filling up;
there's no need to dedicate storage for Hadoop.
All of the characteristics that make GPFS the file system of choice for large-
scale mission-critical IT installations are applicable to GPFS-FPO. After all,
this is still GPFS, but with Hadoop-friendly extensions. You get the same
Search WWH ::




Custom Search