Database Reference
In-Depth Information
since a newer namenode (or datanode) will not operate if its storage layout is an older ver-
The
namespaceID
is a unique identifier for the filesystem namespace, which is created
when the namenode is first formatted. The
clusterID
is a unique identifier for the
HDFS cluster as a whole; this is important for HDFS federation (see
HDFS Federation
)
,
where a cluster is made up of multiple namespaces and each namespace is managed by
one namenode. The
blockpoolID
is a unique identifier for the block pool containing
all the files in the namespace managed by this namenode.
The
cTime
property marks the creation time of the namenode's storage. For newly
formatted storage, the value is always zero, but it is updated to a timestamp whenever the
filesystem is upgraded.
The
storageType
indicates that this storage directory contains data structures for a na-
menode.
The
in_use.lock
file is a lock file that the namenode uses to lock the storage directory.
This prevents another namenode instance from running at the same time with (and pos-
sibly corrupting) the same storage directory.
The other files in the namenode's storage directory are the
edits
and
fsimage
files, and
seen_txid
. To understand what these files are for, we need to dig into the workings of the
namenode a little more.
The filesystem image and edit log
When a filesystem client performs a write operation (such as creating or moving a file),
the transaction is first recorded in the edit log. The namenode also has an in-memory rep-
resentation of the filesystem metadata, which it updates after the edit log has been modi-
fied. The in-memory metadata is used to serve read requests.
Conceptually the edit log is a single entity, but it is represented as a number of files on
disk. Each file is called a
segment
, and has the prefix
edits
and a suffix that indicates the
transaction IDs contained in it. Only one file is open for writes at any one time (
ed-
its_inprogress_0000000000000000020
in the preceding example), and it is flushed and
synced after every transaction before a success code is returned to the client. For namen-
odes that write to multiple directories, the write must be flushed and synced to every copy
before returning successfully. This ensures that no transaction is lost due to machine fail-
ure.
Each
fsimage
file is a complete persistent checkpoint of the filesystem metadata. (The suf-
fix indicates the last transaction in the image.) However, it is not updated for every