Database Reference
In-Depth Information
be able to detect the additional machines and evenly distribute some of our existing
data to these machines accordingly. Because the data needs to be distributed, incom-
ing database requests can be automatically routed to the correct node on the network.
This ability to simply add individual machines to a pool without worrying too much
about configuration of new application logic is known as
linear scalability
, and in
practice, it can be difficult to achieve. Anytime a single piece of data needs to be
accessed by more than one machine, there is potential for bottlenecks to appear. For
example, if one machine is writing a piece of data, and another wants to do the same
thing at the same time, the result will be a resource conf lict. These problems are
challenging, but luckily there are a variety of strategies available for distributing, or
sharding,
data across many machines.
One way to shard data across multiple Redis instances is to decide on a key range
beforehand. This is the easiest way to accomplish this, but there's a drawback: It is not
robust as data scale gets large. For example, imagine that your application collects the
latest scores from thousands of players from an online game. If you have two instances
of Redis, your application might be instructed to send scores from usernames starting
with A through C to one instance, and scores from players with names starting from D
to F to another instance, and so on.
Automatic Partitioning with Twemproxy
Underneath it all, Redis is really designed to be a performant single-server database.
Although the fact that Redis is a key-value data store makes it a bit easier to distrib-
ute an entire dataset across various instances in a cluster pool, we still need to choose
and implement some kind of sharding strategy, as described earlier, to make it work.
As of the release of this topic, the developers of Redis have indeed been working on a
native, fault-tolerant version of the standalone server that allows for automatic cluster
management.
In this example, we will demonstrate an open-source technology developed at
Twitter, called Twemproxy (originally called nutcracker), to help partition our data
needs among a pool of Redis instances, either running on a single machine or running
across multiple machines. Twemproxy accepts requests from clients and uses a config-
ured hashing function to decide which instance in the pool of machines is responsible
for handling the request. Twemproxy is able to not only speak to Redis instances but
also talk to Memcached, another popular in-memory key-value store that is often used
as a data cache for high-traffic applications. According to the Redis development team,
Twemproxy is the recommended way to shard your data needs among multiple Redis
instances.
Twemproxy can also handle errors or other situations. If a particular instance in a
pool of Redis machines is down, Twemproxy can be instructed to hold off for a short
time before retrying the request. It can also be instructed to eject nodes from the pool
if they are down due to failure.
When Twemproxy receives a request to get or set the value for a particular key,
how does it know which machine to alert? Twemproxy supports a variety of hashing
Search WWH ::
Custom Search