Database Reference
In-Depth Information
Path p
=
new
Path
(
"p"
);
fs
.
create
(
p
);
assertThat
(
fs
.
exists
(
p
),
is
(
true
));
However, any content written to the file is not guaranteed to be visible, even if the stream
is flushed. So, the file appears to have a length of zero:
Path p
=
new
Path
(
"p"
);
OutputStream out
=
fs
.
create
(
p
);
out
.
write
(
"content"
.
getBytes
(
"UTF-8"
));
out
.
flush
();
assertThat
(
fs
.
getFileStatus
(
p
).
getLen
(),
is
(
0L
));
Once more than a block's worth of data has been written, the first block will be visible to
new readers. This is true of subsequent blocks, too: it is always the current block being
written that is not visible to other readers.
HDFS provides a way to force all buffers to be flushed to the datanodes via the
hflush()
method on
FSDataOutputStream
. After a successful return from
hflush()
, HDFS guarantees that the data written up to that point in the file has reached
all the datanodes in the write pipeline and is visible to all new readers:
Path p
=
new
Path
(
"p"
);
FSDataOutputStream out
=
fs
.
create
(
p
);
out
.
write
(
"content"
.
getBytes
(
"UTF-8"
));
out
.
hflush
();
assertThat
(
fs
.
getFileStatus
(
p
).
getLen
(),
is
(((
long
)
"content"
.
length
())));
Note that
hflush()
does not guarantee that the datanodes have written the data to disk,
only that it's in the datanodes' memory (so in the event of a data center power outage, for
The behavior of
hsync()
is similar to that of the
fsync()
system call in POSIX that
commits buffered data for a file descriptor. For example, using the standard Java API to
write a local file, we are guaranteed to see the content after flushing the stream and syn-
chronizing:
FileOutputStream out
=
new
FileOutputStream
(
localFile
);
out
.
write
(
"content"
.
getBytes
(
"UTF-8"
));
out
.
flush
();
// flush to operating system
out
.
getFD
().
sync
();
// sync to disk
assertThat
(
localFile
.
length
(),
is
(((
long
)
"content"
.
length
())));
Closing a file in HDFS performs an implicit
hflush()
, too: