Database Reference
In-Depth Information
Filesystem URI
scheme
Java implementation (all under
org.apache.hadoop)
Description
older s3n (S3 native) imple-
mentation.
Azure
A filesystem backed by Mi-
crosoft Azure.
wasb
fs.azure.NativeAzureFileSystem
Swift
fs.swift.snative.SwiftNativeFileSystem A filesystem backed by
OpenStack Swift.
swift
Hadoop provides many interfaces to its filesystems, and it generally uses the URI scheme
to pick the correct filesystem instance to communicate with. For example, the filesystem
shell that we met in the previous section operates with all Hadoop filesystems. To list the
files in the root directory of the local filesystem, type:
% hadoop fs -ls file:///
Although it is possible (and sometimes very convenient) to run MapReduce programs that
access any of these filesystems, when you are processing large volumes of data you
should choose a distributed filesystem that has the data locality optimization, notably
HDFS (see Scaling Out ) .
Interfaces
Hadoop is written in Java, so most Hadoop filesystem interactions are mediated through
the Java API. The filesystem shell, for example, is a Java application that uses the Java
FileSystem class to provide filesystem operations. The other filesystem interfaces are
discussed briefly in this section. These interfaces are most commonly used with HDFS,
since the other filesystems in Hadoop typically have existing tools to access the underly-
ing filesystem (FTP clients for FTP, S3 tools for S3, etc.), but many of them will work
with any Hadoop filesystem.
HTTP
By exposing its filesystem interface as a Java API, Hadoop makes it awkward for non-
Java applications to access HDFS. The HTTP REST API exposed by the WebHDFS pro-
tocol makes it easier for other languages to interact with HDFS. Note that the HTTP inter-
face is slower than the native Java client, so should be avoided for very large data trans-
fers if possible.
There are two ways of accessing HDFS over HTTP: directly, where the HDFS daemons
serve HTTP requests to clients; and via a proxy (or proxies), which accesses HDFS on the
Search WWH ::




Custom Search