Database Reference
In-Depth Information
since a newer namenode (or datanode) will not operate if its storage layout is an older ver-
sion. Upgrading HDFS is covered in Upgrades .
The namespaceID is a unique identifier for the filesystem namespace, which is created
when the namenode is first formatted. The clusterID is a unique identifier for the
HDFS cluster as a whole; this is important for HDFS federation (see HDFS Federation ) ,
where a cluster is made up of multiple namespaces and each namespace is managed by
one namenode. The blockpoolID is a unique identifier for the block pool containing
all the files in the namespace managed by this namenode.
The cTime property marks the creation time of the namenode's storage. For newly
formatted storage, the value is always zero, but it is updated to a timestamp whenever the
filesystem is upgraded.
The storageType indicates that this storage directory contains data structures for a na-
menode.
The in_use.lock file is a lock file that the namenode uses to lock the storage directory.
This prevents another namenode instance from running at the same time with (and pos-
sibly corrupting) the same storage directory.
The other files in the namenode's storage directory are the edits and fsimage files, and
seen_txid . To understand what these files are for, we need to dig into the workings of the
namenode a little more.
The filesystem image and edit log
When a filesystem client performs a write operation (such as creating or moving a file),
the transaction is first recorded in the edit log. The namenode also has an in-memory rep-
resentation of the filesystem metadata, which it updates after the edit log has been modi-
fied. The in-memory metadata is used to serve read requests.
Conceptually the edit log is a single entity, but it is represented as a number of files on
disk. Each file is called a segment , and has the prefix edits and a suffix that indicates the
transaction IDs contained in it. Only one file is open for writes at any one time ( ed-
its_inprogress_0000000000000000020 in the preceding example), and it is flushed and
synced after every transaction before a success code is returned to the client. For namen-
odes that write to multiple directories, the write must be flushed and synced to every copy
before returning successfully. This ensures that no transaction is lost due to machine fail-
ure.
Each fsimage file is a complete persistent checkpoint of the filesystem metadata. (The suf-
fix indicates the last transaction in the image.) However, it is not updated for every
Search WWH ::




Custom Search