Starting Hadoop - Hadoop in Action

Databases Reference

In-Depth Information

<name>mapred.job.tracker</name>

w

Locate JobTracker

master

<value>master:9001</value>

<description>The host and port that the MapReduce job tracker runs

at.</description>

</property>

</configuration>

hdfs-site.xml

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<name>dfs.replication</name>

e

Increase HDFS

replication factor

<description>The actual number of replications can be specified when the

file is created.</description>

</property>

</configuration>

The key differences are

We explicitly stated the hostname for location of the NameNode q and

JobTracker w daemons.

■

We increased the HDFS replication factor

to take advantage of distributed

storage e . Recall that data is replicated across HDFS to increase availability and

reliability.

■

We also need to update the masters and slaves files

to reflect the locations of the other

daemons.

[hadoop-user@master]$ cat masters

backup

[hadoop-user@master]$ cat slaves

hadoop1

hadoop2

hadoop3

...

Once you have copied these files across all the nodes in your cluster, be sure to format

HDFS to prepare it for storage:

[hadoop-user@master]$ bin/hadoop namenode-format

Now you can start the Hadoop daemons:

[hadoop-user@master]$ bin/start-all.sh

and verify the nodes are running their assigned jobs.

[hadoop-user@master]$ jps

30879 JobTracker

30717 NameNode

Hadoop in Action

Search WWH ::

Custom Search

Home