Database Reference
In-Depth Information
On all nodes, you change the value of mapred.job.tracker in the file $HADOOP_PREFIX/conf/mapred-site.xml to be:
hc1nn:54311
This defines the host and port names on all servers for the Map Reduce Job Tracker server to point to the Name
Node machine.
On all nodes, check that the value of dfs.replication in the file $HADOOP_PREFIX/conf/hdfs-site.xml is set to 3.
This means that three copies of each block of data will automatically be kept by HDFS.
In the same file, ensure that the line http://localhost:50070/ for the variable dfs.http.address is changed to:
http://hc1nn:50070/
This sets the HDFS web/http address to point to the Name Node master machine hc1nn. With none of the
Hadoop servers running, you format the cluster from the Name Node server—in this instance, hc1nn:
hadoop namenode -format
At this point, a common problem can occur with Hadoop file system versioning between the name node and data
nodes. Within HDFS, there are files named VERSION that contain version numbering information that is regenerated
each time the file system is formatted, such as:
[hadoop@hc1nn dfs]$ pwd
/app/hadoop/tmp/dfs
[hadoop@hc1nn dfs]$ find . -type f -name VERSION -exec grep -H namespaceID {} \;
./data/current/VERSION:namespaceID=1244166645
./name/current/VERSION:namespaceID=1244166645
./name/previous.checkpoint/VERSION:namespaceID=1244166645
./namesecondary/current/VERSION:namespaceID=1244166645
The Linux command shown here is executed as the hadoop user searches for the VERSION files under /app/
hadoop/tmp/dfs and strips the namespace ID information out of them. If this command was executed on the Name
Node server and the Data Node servers, you would expect to see the same value 1244166645. When this versioning
gets out of step on the data nodes, an error occurs, such as follows:
ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Incompatible
namespaceIDs
While this problem seems to have two solutions, only one is viable. Although you could delete the data directory
/app/hadoop/tmp/dfs/data on the offending data node, reformat the file system, and then start the servers, this
approach will cause data loss. The second, more effective method involves editing the VERSION files on the data
nodes so that the namespace ID values match those found on the Name Node machine.
You need to ensure that your firewall will enable port access for Hadoop to communicate. When you attempt to
start the Hadoop servers, check the logs in the log directory (/usr/local/hadoop/logs).
Now, start the cluster from the name node; this time, you will start the HDFS servers using the script start-dfs.sh :
[hadoop@hc1nn logs]$ start-dfs.sh
starting namenode, logging to /usr/local/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-namenode-
hc1nn.out
hc1r1m2: starting datanode, logging to /usr/local/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-
datanode-hc1r1m2.out
Search WWH ::




Custom Search