Database Reference
In-Depth Information
Afterthisjobhasbeenstarted, checkthattheeditsarebeingpostedtoKafka
using the console consumer that ships with Kafka:
$ ./deploy/kafka/bin/kafka-console-consumer.sh \
> --zookeeper localhost:2181 --topic wikipedia-raw
Multinode Samza
Getting Samza going in a multinode environment is very much like setting
up a Hadoop 2 environment, except that HDFS is not required unless other
applications will be using it.
To begin, make sure a ZooKeeper cluster is available, referring to Chapter
3 if necessary for installation and configuration instructions. ZooKeeper is
required for both Samza and the Kafka brokers.
Next, install and configure Kafka brokers on the machines destined for the
YARN grid except for the machine to be used as the ResourceManager .
Samza makes heavy use of Kafka for communication, so its brokers are
usually co-located with the NodeManagers that make up the Samza YARN
grid. For help with installing and configuring Kafka, refer to Chapter 4.
Now that ZooKeeper and Kafka have been installed, set up the YARN
cluster. In this section it is assumed that you are constructing this YARN
cluster from the Apache build rather than one of the commercial
distributions. It is also assumed that the appropriate package manager for
the operating system being used does not have appropriate packages
available for Hadoop 2.2.0 (many still have older Hadoop 1.0.3 packages).
First, download and unpack the Apache binary distribution, which is at
version 2.2.0 at the time of writing, onto each of the machines:
$ wget http://mirrors.sonic.net/
apache/hadoop/common/hadoop-2.2.0/
hadoop-2.2.0.tar.gz
$ cd /usr/local
$ sudo tar xvfz ~/hadoop-2.2.0.tar.gz
$ sudo ln -s hadoop-2.2.0/ hadoop
Apache YARN relies on a number of environment variables to tell it where
to find its various packages. There are a number of places this can be set,
but to have it active for all users it should be set in either /etc/profile
Search WWH ::




Custom Search