Database Reference
In-Depth Information
filesystem write operation, because writing out the
fsimage
file, which can grow to be
gigabytes in size, would be very slow. This does not compromise resilience because if the
namenode fails, then the latest state of its metadata can be reconstructed by loading the
latest
fsimage
from disk into memory, and then applying each of the transactions from the
relevant point onward in the edit log. In fact, this is precisely what the namenode does
when it starts up (see
Safe Mode
).
NOTE
Each
fsimage
file contains a serialized form of all the directory and file inodes in the filesystem. Each in-
ode is an internal representation of a file or directory's metadata and contains such information as the
file's replication level, modification and access times, access permissions, block size, and the blocks the
file is made up of. For directories, the modification time, permissions, and quota metadata are stored.
An
fsimage
file does not record the datanodes on which the blocks are stored. Instead, the namenode
keeps this mapping in memory, which it constructs by asking the datanodes for their block lists when
they join the cluster and periodically afterward to ensure the namenode's block mapping is up to date.
As described, the edit log would grow without bound (even if it was spread across several
physical
edits
files). Though this state of affairs would have no impact on the system
while the namenode is running, if the namenode were restarted, it would take a long time
to apply each of the transactions in its (very long) edit log. During this time, the filesys-
tem would be offline, which is generally undesirable.
The solution is to run the secondary namenode, whose purpose is to produce checkpoints
as follows (and is shown schematically in
Figure 11-1
for the edit log and image files
shown earlier):
1. The secondary asks the primary to roll its in-progress
edits
file, so new edits go to
a new file. The primary also updates the
seen_txid
file in all its storage director-
ies.
2. The secondary retrieves the latest
fsimage
and
edits
files from the primary (using
HTTP GET).
3. The secondary loads
fsimage
into memory, applies each transaction from
edits
,
then creates a new merged
fsimage
file.
4. The secondary sends the new
fsimage
back to the primary (using HTTP PUT),
and the primary saves it as a temporary
.ckpt
file.
5. The primary renames the temporary
fsimage
file to make it available.
At the end of the process, the primary has an up-to-date
fsimage
file and a short in-pro-
gress
edits
file (it is not necessarily empty, as it may have received some edits while the