Database Reference
In-Depth Information
Maintenance
Routine Administration Procedures
Metadata backups
If the namenode's persistent metadata is lost or damaged, the entire filesystem is rendered
unusable, so it is critical that backups are made of these files. You should keep multiple
copies of different ages (one hour, one day, one week, and one month, say) to protect
against corruption, either in the copies themselves or in the live files running on the namen-
ode.
A straightforward way to make backups is to use the dfsadmin command to download a
copy of the namenode's most recent fsimage :
% hdfs dfsadmin -fetchImage fsimage.backup
You can write a script to run this command from an offsite location to store archive copies
of the fsimage . The script should additionally test the integrity of the copy. This can be
done by starting a local namenode daemon and verifying that it has successfully read the
fsimage and edits files into memory (by scanning the namenode log for the appropriate suc-
cess message, for example). [ 78 ]
Data backups
Although HDFS is designed to store data reliably, data loss can occur, just like in any stor-
age system; thus, a backup strategy is essential. With the large data volumes that Hadoop
can store, deciding what data to back up and where to store it is a challenge. The key here
is to prioritize your data. The highest priority is the data that cannot be regenerated and that
is critical to the business; however, data that is either straightforward to regenerate or es-
sentially disposable because it is of limited business value is the lowest priority, and you
may choose not to make backups of this low-priority data.
Search WWH ::




Custom Search