Database Reference
In-Depth Information
Installation of Spark
By way of example, I install Spark onto a 64-bit cluster using the CDH5 name node machine hc2nn and the data nodes
hc2r1m1 to hc2r1m4. Spark works on a master-slave model, so I use the name node machine hc2nn as the master and
the date node machines as the slaves. Unless stated otherwise, I carry out the installation as the Linux root user.
My first step is to set up a suitable repository file under the directory /etc/yum.repos.d on each machine so that
the Linux yum command knows where and how to source the installation packages:
[root@hc2r1m1 ~]# cd /etc/yum.repos.d
[root@hc2r1m1 yum.repos.d]# cat cloudera-cdh5.repo
[cloudera-cdh5]
# Packages for Cloudera's Distribution for Hadoop, Version 5, on RedHat or CentOS 6 x86_64
name=Cloudera's Distribution for Hadoop, Version 5
baseurl= http://archive.cloudera.com/cdh5/redhat/6/x86_64/cdh/5/
gpgkey = http://archive.cloudera.com/cdh5/redhat/6/x86_64/cdh/RPM-GPG-KEY-cloudera
gpgcheck = 1
The repository file (cloudera-cdh5.repo) tells yum to look at the repository URL http://archive.cloudera.com/
cdh5/redhat/6/x86_64/cdh/5/ when installing the software. After setting up the repository file on each machine, I'm
ready to install the Spark services on all machines. This command installs the Spark Master server, History server, and
worker servers, as well as core and Python modules:
[root@hc2r1m1 ~]# yum install spark-core spark-master spark-worker spark-history-server spark-python
I install these components on each node, then set up the configuration under /etc/spark/conf/. I remember to
make these changes on all servers unless instructed otherwise. Initially, I set up the slave files so that Spark knows
where the slaves will run:
[root@hc2r1m1 ~]# cd /etc/spark/conf/
[root@hc2r1m4 conf]# cat slaves
# A Spark Worker will be started on each of the machines listed below.
hc2r1m1
hc2r1m2
hc2r1m3
hc2r1m4
Next, I edit the file spark-env.sh and set the value of the STANDALONE_SPARK_MASTER_HOST variable to be the full
name of the master host:
export STANDALONE_SPARK_MASTER_HOST=hc2nn.semtech-solutions.co.nz
if you set this value incorrectly—for instance, using a host short name—you may encounter this error:
14/09/09 18:20:52 ERROR remote.EndpointWriter: dropping message [class akka.actor.SelectChildName]
for non-local recipient [Actor[akka.tcp://sparkMaster@hc2nn:7077/]]
arriving at [akka.tcp://sparkMaster@hc2nn:7077] inbound
addresses are [akka.tcp://sparkMaster@hc2nn.semtech-solutions.co.nz:7077]
Note
 
 
Search WWH ::




Custom Search