Hardware Reference
In-Depth Information
FIGURE 10.4: Direct interface showing the three user-level interface layers
available to programmers. Upper layers provide compatibility with POSIX,
lower layers provide non-standard features.
libmoddavorangefs module works with the Apache Web server and
the standard WebDav module and the OrangeFS client.
Hadoop. MapReduce can be run directly over OrangeFS using an exten-
sion of the MapReduce \FileSystem" class and a Java Native Interface
(JNI) shim to the OrangeFS client. No modifications of Hadoop are re-
quired, and existing MapReduce jobs require no modification to utilize
OrangeFS. With HDFS, clients and data servers are paired together,
running on the same hardware. OrangeFS, as part of an HPC clus-
ter, leverages an existing investment in HPC to run Hadoop MapRe-
duce workloads. Tests running Hadoop MapReduce over OrangeFS
have provided the following insights: MapReduce clients accessing a re-
mote OrangeFS storage cluster yielded a 25% faster combined runtime
than the traditional approach, where MapReduce clients access data lo-
cally for the three operations ( teragen , terasort , and teravalidate ).
OrangeFS and HDFS, without replication enabled, performed simi-
larly under identical local (traditional HDFS) configurations, (within
0.2%); however, OrangeFS adds the advantages of a general-purpose,
scale-out file system. With a general-purpose file system, applications
can read and write data to OrangeFS while it remains available for
Hadoop MapReduce job input, improving runtime by eliminating time-
consuming HDFS stage-in and stage-out operations.
Doubling the number of compute nodes accessing remote OrangeFS results
in about a 300% improvement on terasort job runtime. OrangeFS provides
good results when clients significantly overcommit storage servers.
Search WWH ::




Custom Search