MapReduce - Hadoop: The Definitive Guide

Database Reference

In-Depth Information

As an alternative to creating a new file, you can append to an existing file using the ap-

pend() method (there are also some other overloaded versions):

public FSDataOutputStream append ( Path f ) throws IOException

The append operation allows a single writer to modify an already written file by opening

it and writing data from the final offset in the file. With this API, applications that produce

unbounded files, such as logfiles, can write to an existing file after having closed it. The

append operation is optional and not implemented by all Hadoop filesystems. For ex-

ample, HDFS supports append, but S3 filesystems don't.

Example 3-4 shows how to copy a local file to a Hadoop filesystem. We illustrate progress

by printing a period every time the progress() method is called by Hadoop, which is

after each 64 KB packet of data is written to the datanode pipeline. (Note that this particu-

lar behavior is not specified by the API, so it is subject to change in later versions of Ha-

doop. The API merely allows you to infer that “something is happening.”)

Example 3-4. Copying a local file to a Hadoop filesystem

public class FileCopyWithProgress {

public static void main ( String [] args ) throws Exception {

String localSrc = args [ 0 ];

String dst = args [ 1 ];

InputStream in = new BufferedInputStream ( new

FileInputStream ( localSrc ));

Configuration conf = new Configuration ();

FileSystem fs = FileSystem . get ( URI . create ( dst ), conf );

OutputStream out = fs . create ( new Path ( dst ), new Progressable () {

public void progress () {

System . out . print ( "." );

}

});

IOUtils . copyBytes ( in , out , 4096 , true );

}

Typical usage:

% hadoop FileCopyWithProgress input/docs/1400-8.txt

hdfs://localhost/user/tom/1400-8.txt

.................

Search WWH ::

Custom Search

Home