Database Reference
In-Depth Information
The program runs as follows:
% hadoop FileSystemCat hdfs://localhost/user/tom/quangle.txt
On the top of the Crumpetty Tree
The Quangle Wangle sat,
But his face you could not see,
On account of his Beaver Hat.
FSDataInputStream
The open() method on FileSystem actually returns an FSDataInputStream
rather than a standard java.io class. This class is a specialization of
java.io.DataInputStream with support for random access, so you can read from
any part of the stream:
package org . apache . hadoop . fs ;
public class FSDataInputStream extends DataInputStream
implements Seekable , PositionedReadable {
// implementation elided
}
The Seekable interface permits seeking to a position in the file and provides a query
method for the current offset from the start of the file ( getPos() ):
public interface Seekable {
void seek ( long pos ) throws IOException ;
long getPos () throws IOException ;
}
Calling seek() with a position that is greater than the length of the file will result in an
IOException . Unlike the skip() method of java.io.InputStream , which pos-
itions the stream at a point later than the current position, seek() can move to an arbit-
rary, absolute position in the file.
A simple extension of Example 3-2 is shown in Example 3-3 , which writes a file to stand-
ard output twice: after writing it once, it seeks to the start of the file and streams through it
once again.
Example 3-3. Displaying files from a Hadoop filesystem on standard output twice, by using
seek()
public class FileSystemDoubleCat {
public static void main ( String [] args ) throws Exception {
String uri = args [ 0 ];
Search WWH ::




Custom Search