Database Reference
In-Depth Information
Chapter 6
Exploring the HDInsight Name Node
The HDInsight name node is just another virtual machine provisioned in Windows Azure. Theoretically, this is
the equivalent of the traditional Apache Hadoop name node or the head node, which is the heart and soul of your
Hadoop cluster. I would like to re-iterate what I pointed out in Chapter 1: the name node is the single point of failure
in a Hadoop cluster. Most important of all, the name node contains the metadata of the entire cluster storage blocks
and maintains co-ordination among the data nodes, so understandably it could bring down the entire cluster.
There is a Secondary Name Node service (ideally run on a dedicated physical server) that keeps track of the
changed HDFS blocks in the name node and periodically backs up the name node. In addition, you can fail over to the
secondary name in the unlikely event of a name-node failure, but that failover is a manual process.
Note
The HDInsight Service brings a significant change from the traditional approach taken in Apache Hadoop.
It does so by isolating the storage to a Windows Azure Storage Blob instead of to the traditional Hadoop Distributed
File System (HDFS) that is local to the data nodes.
In the Windows Azure HDInsight service, the storage is separated from the cluster itself by default; the default
Hadoop file system is pointed to Azure blob storage rather than traditional HDFS in HDInsight distribution. If you
recall, we discussed the advantages of using Windows Azure Storage Blob (WASB) earlier in Chapter 2. This reduces
the cluster's dependency on the name node to some extent; still, the HDInsight name node continues to be an
integral part of your cluster. You could start a remote desktop session to log on to the name node and get access to the
traditional Apache Hadoop web portals and dashboards. This also gives you access to the Hadoop command prompt
and the various service logs, and it is the old-fashioned way to administer your cluster.
It continues to be a favorite for a lot of users who still prefer the command-prompt way of doing things in today's
world of rich and intuitive user interfaces for almost everything. I often find myself in this category too because I
believe command-line interfaces are the bare minimum and they give you the raw power of your modules by getting
rid of any abstractions in between. It is also a good practice to operate your cluster using the command shell to test
and benchmark performance because it does not have any additional overhead. This chapter focuses on some of the
basic command-line utilities to operate your Hadoop cluster and the unique features that are implemented in the
HDInsight offering.
Accessing the HDInsight Name Node
You have to enable remote connectivity to your name node from the Azure Management portal. By default, remote
login is turned off. You can enable it from your cluster's configuration screen as shown in Figure 6-1 .
 
 
Search WWH ::




Custom Search