Database Reference
In-Depth Information
<?xml version="1.0"?>
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>resource-manager.mydomain.net</value>
</property>
</configuration>
Itshouldnowbepossibletostartthe
ResouceManager
and
NodeManager
on the Kafka grid. To start the
ResourceManager
, log in to the machine
and use
yarn-daemon.sh
to start the server:
$ yarn-daemon.sh -config $YARN_HOME/etc/hadoop start
resourcemanager
Then, on each of the nodes in the Samza grid, start the
NodeManager
in the
same way:
$ yarn-daemon.sh -config $YARN_HOME/etc/hadoop start
nodemanager
The
ResourceManager
starts a web server on port 8088 by default; it
can be checked to ensure each of the nodes has reported to the resource
manager. The most common problem at this point is an incorrect firewall
setting.
Integrating Samza into the Data Flow
Integrating Samza into an existing Kafka environment is straightforward,
as Samza uses Kafka for all communication. If there is an existing set of
brokers already handling production load, simply use
MirrorMaker
as
described in Chapter 4 to mirror the desired topics into the Samza Kafka
grid. From there, Samza has easy access to the incoming topics.
Alternatively, install the Samza grid on the same machines as the Kafka
brokers used tocollect data. This has some slight operational disadvantages,
as it is always possible a processing job could lock up a machine and bring
it down. However, it is likely more operationally efficient because Kafka
brokers usually have spare processing cycles.