Database Reference
In-Depth Information
write) operations are sent first to the namenode, which sends an HTTP redirect to the cli-
ent indicating the datanode to stream file data from (or to).
The second way of accessing HDFS over HTTP relies on one or more standalone proxy
servers. (The proxies are stateless, so they can run behind a standard load balancer.) All
traffic to the cluster passes through the proxy, so the client never accesses the namenode
or datanode directly. This allows for stricter firewall and bandwidth-limiting policies to be
put in place. It's common to use a proxy for transfers between Hadoop clusters located in
different data centers, or when accessing a Hadoop cluster running in the cloud from an
external network.
The HttpFS proxy exposes the same HTTP (and HTTPS) interface as WebHDFS, so cli-
ents can access both using
webhdfs
(or
swebhdfs
) URIs. The HttpFS proxy is started
independently of the namenode and datanode daemons, using the
httpfs.sh
script, and by
default listens on a different port number (14000).
C
Hadoop provides a C library called
libhdfs
that mirrors the Java
FileSystem
interface
(it was written as a C library for accessing HDFS, but despite its name it can be used to
access any Hadoop filesystem). It works using the
Java Native Interface
(JNI) to call a
Java filesystem client. There is also a
libwebhdfs
library that uses the WebHDFS interface
described in the previous section.
The C API is very similar to the Java one, but it typically lags the Java one, so some new-
er features may not be supported. You can find the header file,
hdfs.h
, in the
include
dir-
ectory of the Apache Hadoop binary tarball distribution.
The Apache Hadoop binary tarball comes with prebuilt
libhdfs
binaries for 64-bit Linux,
but for other platforms you will need to build them yourself by following the
BUILDING.txt
instructions at the top level of the source tree.
NFS
It is possible to mount HDFS on a local client's filesystem using Hadoop's NFSv3 gate-
way. You can then use Unix utilities (such as
ls
and
cat
) to interact with the filesystem,
upload files, and in general use POSIX libraries to access the filesystem from any pro-
gramming language. Appending to a file works, but random modifications of a file do not,
since HDFS can only write to the end of a file.
Consult the Hadoop documentation for how to configure and run the NFS gateway and
connect to it from a client.