Database Reference
In-Depth Information
documentation only considers the map-reduce case. This makes it difficult
for an inexperienced user to get much traction with getting set up.
A user already familiar with the maintenance of a YARN cluster will have
much more success with Samza and should consider it for implementing a
real-time application. In many ways it is a cleaner design than Storm, and
its native support of Kafka makes it easy to integrate into a Kafka-based
environment.
For first-time users, Storm is the more successful framework. Although it
too can be hosted in a YARN cluster, that is not the typical deployment. The
non-YARN deployment discussed in this topic is much easier for new users
to understand, and it's relatively easy to manage. The only disadvantage is
that it does not include native support for either Kafka or Flume. Chapter 5
addresses the integration, but the development cycle of the plug-ins is quite
different from the mainline Storm code, which can cause incompatibilities
around the time of release.
Storage
For relatively small build-outs, such as a proof of concept or a fairly
low-volume application, Redis is the obvious choice for data storage. It has
a rich set of abstractions beyond the simple key-value store that allows for
sophisticated storage. It is easy to configure, install, and maintain, and it
requires almost no real maintenance (it is even available as a turnkey option
from various cloud-based providers). It also has a wide array of available
clients.
The two drawbacks of Redis are that it has not really addressed the problem
of horizontal scalability for writes and it is limited to available random
access memory (RAM) of the master server. Like many of the tools in this
topic, it offers a master-slave style of replication with the ability to failover
to one of the replicas using Redis Sentinel. However, this still requires that
all writes go to a single server. There are several client-side projects, such as
Twitter's Twemproxy, that attempt to work around this limitation, but there
is no native Redis solution as of yet. There is an ongoing clustering project,
but there is no timeline for its stable release.
If very large amounts of infrequently accessed data are going to be needed,
Cassandra is an excellent choice. Early versions of Cassandra suffered from
a “query language” that was essentially just an RPC layer on the internal
Search WWH ::




Custom Search