Integration with Hadoop - Mastering Apache Cassandra

Database Reference

In-Depth Information

Setting up local Hadoop

This section will discuss how to set up Hadoop 2.6.0 (should be valid for all 2.x versions)

on your local machine. At the time of writing this, Hadoop has transitioned from version 1

to version 2. Version 2.x is mature now and has disruptive changes. It is presumably better

than version 1. HDFS is federated in the new version. The Apache documentation says that

this version scales NameNode service horizontally using multiple independent

NameNodes. This should ideally avoid the single point of failure that NameNodes faced in

the previous version. The other major improvement is in the new MapReduce frameworks,

for example, MRNextGen, MRv2, and Yet Another Resource Negotiator ( YARN ). More

about the new version can be learned on the Apache website ( http://hadoop.apache.org/

docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html ) . Here are the steps to get Hadoop

Version 2.x to work on a Linux machine. To keep things generic, I have used the dow-

loaded zipped file to install the Hadoop. One can use a binary package for a specific plat-

form without much of a change in the instructions. The following figure shows Hadoop in-

frastructure. Heavy-duty master nodes are at the top of the rack servers. Each rack has a

rack switch. Slaves run DataNode and TaskTracker services. Racks are connected through

10GE switches. Note that not all racks will have a master server:

Make sure you can add SSH to your local host using a key-based password-less login. If

you can't do this, generate and set a key pair as described in the following commands:

# Generate key pair

$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa

Generating public/private dsa key pair.

Your identification has been saved in /home/ec2-user/.ssh/

id_dsa.

Your public key has been saved in /home/ec2-user/.ssh/

id_dsa.pub.

The key fingerprint is:

Search WWH ::

Custom Search

Home