Databases Reference
In-Depth Information
Cassandra Key Space
USER
Personal
Contact
RowKey
Name
Genre
Phone
Address
User1
Duc A. Tran
M
617-287-6452
100 Morrissey Blvd
User2
Jane Doe
F
25 Main Street
Fig. 1.4
Cassandra table with a super column family
names ! column values g , a row in a super column family is a sorted map of
f super column names ! maps of column names to column values g . Figure 1.4
shows a super column family named “ USER ,” which consists of two super columns,
“Personal” and “Contact.”
1.3.2
Partitioning
Similar to Dynamo, Cassandra uses consistent hashing based on user ID to partition
the data across the storage nodes. Each node is assigned a unique position, or
a token , on the ring, and responsible for a range of row keys starting from the
predecessor node's token to this node's token. Cassandra differs from Dynamo in
dealing with load balancing due to the heterogeneity of the nodes. As discussed
in Sect. 1.1.1 , Dynamo uses the concept of virtual nodes to improve load balanc-
ing. Cassandra does not implement this approach. Instead, it analyzes the load
information at each physical node and performs redistribution of load whenever
the system detects an imbalance. This way, Cassandra wants to make the design
and implementation tractable and provide deterministic decisions regarding load
balancing.
1.3.3
Replication
Cassandra allows the application to choose its replication policy on top of the
data partition. One policy provided by Cassandra, namely “Rack Unaware,” is to
replicate each data item on the successor nodes of its coordinator node on the ring.
This approach is similar to that of Dynamo. Other replication policies are “Rack
Aware” and “Data Center Aware” which take into account the load balancing across
Search WWH ::




Custom Search