Analytics with Hadoop - Big Data Made Easy: A Working Guide to the Complete Hadoop Toolset

Database Reference

In-Depth Information

Installation of Spark

By way of example, I install Spark onto a 64-bit cluster using the CDH5 name node machine hc2nn and the data nodes

hc2r1m1 to hc2r1m4. Spark works on a master-slave model, so I use the name node machine hc2nn as the master and

the date node machines as the slaves. Unless stated otherwise, I carry out the installation as the Linux root user.

My first step is to set up a suitable repository file under the directory /etc/yum.repos.d on each machine so that

the Linux yum command knows where and how to source the installation packages:

[root@hc2r1m1 ~]# cd /etc/yum.repos.d

[root@hc2r1m1 yum.repos.d]# cat cloudera-cdh5.repo

[cloudera-cdh5]

# Packages for Cloudera's Distribution for Hadoop, Version 5, on RedHat or CentOS 6 x86_64

name=Cloudera's Distribution for Hadoop, Version 5

gpgcheck = 1

The repository file (cloudera-cdh5.repo) tells yum to look at the repository URL http://archive.cloudera.com/

cdh5/redhat/6/x86_64/cdh/5/ when installing the software. After setting up the repository file on each machine, I'm

ready to install the Spark services on all machines. This command installs the Spark Master server, History server, and

worker servers, as well as core and Python modules:

[root@hc2r1m1 ~]# yum install spark-core spark-master spark-worker spark-history-server spark-python

I install these components on each node, then set up the configuration under /etc/spark/conf/. I remember to

make these changes on all servers unless instructed otherwise. Initially, I set up the slave files so that Spark knows

where the slaves will run:

[root@hc2r1m1 ~]# cd /etc/spark/conf/

[root@hc2r1m4 conf]# cat slaves

# A Spark Worker will be started on each of the machines listed below.

hc2r1m1

hc2r1m2

hc2r1m3

hc2r1m4

Next, I edit the file spark-env.sh and set the value of the STANDALONE_SPARK_MASTER_HOST variable to be the full

name of the master host:

export STANDALONE_SPARK_MASTER_HOST=hc2nn.semtech-solutions.co.nz

■ if you set this value incorrectly—for instance, using a host short name—you may encounter this error:

14/09/09 18:20:52 ERROR remote.EndpointWriter: dropping message [class akka.actor.SelectChildName]

for non-local recipient [Actor[akka.tcp://sparkMaster@hc2nn:7077/]]

arriving at [akka.tcp://sparkMaster@hc2nn:7077] inbound

addresses are [akka.tcp://sparkMaster@hc2nn.semtech-solutions.co.nz:7077]

Note

Search WWH ::

Custom Search

Home