Introducing Big Data Technologies - Data Warehousing in the Age of Big Data

Databases Reference

In-Depth Information

FIGURE 4.26

Cassandra ring architecture.

Data placement

Data placement around the ring is not fixed in any default configuration. Cassandra provides two

components called snitches and strategies, to determine which nodes will receive copies of data.

●

Snitches define the proximity of nodes within the ring and provide information on the network topology.

●

Strategies use the information snitches provide them about node proximity along with an

implemented algorithm to collect nodes that will receive writes.

Data partitioning

Data is distributed across the nodes by using partitioners. Since Cassandra is based on a ring topology

or architecture, the ring is divided into ranges equal to the number of nodes, where each node can be

responsible for one or more ranges of the data. When a node is joined to a ring, a token is issued, and

this token determines the node's position on the ring and assigns the range of data it is responsible

for. Once the assignment is done, we cannot undo it without reloading all the data.

Cassandra provides native partitioners and supports any user-defined partitioner. The key feature

difference in the native partitioner is the order preservation of keys.

●

Random partitioner. This is the default choice for Cassandra. It uses an MD5 hash function to

map keys into tokens, which will evenly distribute across the clusters. Random partition hashing

techniques ensure that when nodes are added to the cluster, the least possible set of data is

affected. While the keys are evenly distributed, there is no ordering of the data, which will need

the query to be processed by all nodes in an operation.

Search WWH ::

Custom Search

Home