MapReduce - Hadoop: The Definitive Guide

Database Reference

In-Depth Information

try {

in = new URL ( args [ 0 ]). openStream ();

IOUtils . copyBytes ( in , System . out , 4096 , false );

} finally {

IOUtils . closeStream ( in );

}

We make use of the handy IOUtils class that comes with Hadoop for closing the stream

in the finally clause, and also for copying bytes between the input stream and the out-

put stream ( System.out , in this case). The last two arguments to the copyBytes()

method are the buffer size used for copying and whether to close the streams when the

copy is complete. We close the input stream ourselves, and System.out doesn't need to

be closed.

Here's a sample run: [ 31 ]

% export HADOOP_CLASSPATH=hadoop-examples.jar

% hadoop URLCat hdfs://localhost/user/tom/quangle.txt

On the top of the Crumpetty Tree

The Quangle Wangle sat,

But his face you could not see,

On account of his Beaver Hat.

Reading Data Using the FileSystem API

As the previous section explained, sometimes it is impossible to set a

URLStreamHandlerFactory for your application. In this case, you will need to use

the FileSystem API to open an input stream for a file.

A file in a Hadoop filesystem is represented by a Hadoop Path object (and not a

java.io.File object, since its semantics are too closely tied to the local filesystem).

You can think of a Path as a Hadoop filesystem URI, such as hdfs://localhost/

user/tom/quangle.txt .

FileSystem is a general filesystem API, so the first step is to retrieve an instance for

the filesystem we want to use — HDFS, in this case. There are several static factory meth-

ods for getting a FileSystem instance:

public static FileSystem get ( Configuration conf ) throws IOException

public static FileSystem get ( URI uri , Configuration conf ) throws

IOException

public static FileSystem get ( URI uri , Configuration conf , String

Search WWH ::

Custom Search

Home