Database Reference
In-Depth Information
The program runs as follows:
%
hadoop FileSystemCat hdfs://localhost/user/tom/quangle.txt
On the top of the Crumpetty Tree
The Quangle Wangle sat,
But his face you could not see,
On account of his Beaver Hat.
FSDataInputStream
The
open()
method on
FileSystem
actually returns an
FSDataInputStream
rather than a standard
java.io
class. This class is a specialization of
java.io.DataInputStream
with support for random access, so you can read from
any part of the stream:
package
org
.
apache
.
hadoop
.
fs
;
public class
FSDataInputStream
extends
DataInputStream
implements
Seekable
,
PositionedReadable
{
// implementation elided
}
The
Seekable
interface permits seeking to a position in the file and provides a query
method for the current offset from the start of the file (
getPos()
):
public interface
Seekable
{
void
seek
(
long
pos
)
throws
IOException
;
long
getPos
()
throws
IOException
;
}
Calling
seek()
with a position that is greater than the length of the file will result in an
IOException
. Unlike the
skip()
method of
java.io.InputStream
, which pos-
itions the stream at a point later than the current position,
seek()
can move to an arbit-
rary, absolute position in the file.
A simple extension of
Example 3-2
is shown in
Example 3-3
,
which writes a file to stand-
ard output twice: after writing it once, it seeks to the start of the file and streams through it
once again.
Example 3-3. Displaying files from a Hadoop filesystem on standard output twice, by using
seek()
public class
FileSystemDoubleCat
{
public static
void
main
(
String
[]
args
)
throws
Exception
{
String uri
=
args
[
0
];