Database Reference
In-Depth Information
systems, its linear read performance is often very good making them an
excellent choice for offline analysis.
Hadoop has now become the gold standard for these batch environments.
ThischapterdescribessettingupandintegratingHadoopintothestreaming
environment.
This chapter also discusses the basics of using Hadoop as a gateway to
existing business intelligence infrastructures. The reason for this is that,
although this topic is focused on real-time streaming data analysis, no
modern analytics system exists in a vacuum. A system must integrate with
other pieces of an organization's environment if it hopes to gain widespread
adoption.
Consistent Hashing
Several of the data stores described in the remainder of this chapter support
the distribution of data across several servers. This allows them to scale
more easily as the size of the data grows. It also generally improves
performance for both updates and queries. A popular technique to
implement the distribution of data is known as consistent hashing.
Most data stores have some notion of a “key” element. In a relational
database it is called a primary key and in key-value stores it is simply called
the keys. The most important element of the key is that there can only be
a single entry for it in the data store (relational databases that do not have
a specified primary key generally create an arbitrary primary key that is
hidden from the user).
Each of these primary keys corresponds to a numerical value that is then
assigned to a specific server. The mechanism used to perform this
assignment varies, but the effect is that a specific primary key is always
assigned to the same server for storage and querying.
This helps to distribute the load across a variety of servers, but does so at
the expense of reliability. If any of the servers crashes, then the data stored
on that server becomes unavailable and, as the number of servers increases,
the probability that a server will have crashed at a given time increases.
To improve the reliability of the system, the data is instead consistently
hashed. In consistent hashing the servers are first placed into a specific
stable ordering, called a ring. When a primary key is modified the initial
Search WWH ::




Custom Search