Database Reference
In-Depth Information
service status hmonitor-namenode-monitor
An important concept to understand is that Hadoop has significant
high-availability and robustness features built in to it to withstand
unscheduled downtime. Hadoop is built with the understanding that
hardware does fail. One of the architectural goals of the Hadoop Distributed
File System (HDFS) is the automatic recovery from any failures. With
hundreds of servers in your big data solution, there will always be some
nonfunctionalhardware,andthearchitectureofHDFSisintendedtohandle
these failures gracefully while repairs are made. The three places of concern
are DataNode failures, NameNode failures, and network partitions.
A DataNode may become unresponsive for any number of reasons. It might
simply have a hardware failure such as a motherboard failure, it could
have a replica become corrupted, and hard drives will fail, along with many
other reasons equipment fails. Each DataNode sends a heartbeat to the
NameNode onaregular basis. IftheNameNode doesnotreceive aheartbeat
message, it marks the DataNode as dead and does not send any new data
requests to the DataNode. Any data that was on that DataNode is not
available to HDFS anymore, and now that data's replication factor will likely
be below that specified, which will kick off re-replication of that data to
another DataNode.
Another reason Hadoop clusters become nonresponsive is due to
NameNode failures. NameNodes are most likely to fail because of
misconfiguration and network issues. This is similar to our collective
experience with Windows cluster failovers, where it is usually not hardware
issues that cause failovers but external issues such as Active Directory
problems, networking, or other configuration issues. Pay heed to these
external influences as you diagnose failures within your environment.
In this section we examined some basics into planning for high availability.
In the next section, we will dive a bit into what happens if something does
go wrong—preparing for and dealing with disaster recovery.
Disaster Recovery
By now, you should be aware that Hadoop has many high-availability
features built in to it to prevent failures. As you probably know by this
point in your career, despite all the features included in products and all
the planning we can do, disasters happen. When they do, your service level
Search WWH ::




Custom Search