Introduction - Data Storage for Social Networks: A Socially Aware Approach

Databases Reference

In-Depth Information

Cassandra Key Space

USER

Personal

Contact

RowKey

Name

Genre

Phone

Address

User1

Duc A. Tran

M

617-287-6452

100 Morrissey Blvd

User2

Jane Doe

F

25 Main Street

Fig. 1.4

Cassandra table with a super column family

names ! column values g , a row in a super column family is a sorted map of

f super column names ! maps of column names to column values g . Figure 1.4

shows a super column family named “ USER ,” which consists of two super columns,

“Personal” and “Contact.”

1.3.2

Partitioning

Similar to Dynamo, Cassandra uses consistent hashing based on user ID to partition

the data across the storage nodes. Each node is assigned a unique position, or

a token , on the ring, and responsible for a range of row keys starting from the

predecessor node's token to this node's token. Cassandra differs from Dynamo in

dealing with load balancing due to the heterogeneity of the nodes. As discussed

in Sect. 1.1.1 , Dynamo uses the concept of virtual nodes to improve load balanc-

ing. Cassandra does not implement this approach. Instead, it analyzes the load

information at each physical node and performs redistribution of load whenever

the system detects an imbalance. This way, Cassandra wants to make the design

and implementation tractable and provide deterministic decisions regarding load

balancing.

1.3.3

Replication

Cassandra allows the application to choose its replication policy on top of the

data partition. One policy provided by Cassandra, namely “Rack Unaware,” is to

replicate each data item on the successor nodes of its coordinator node on the ring.

This approach is similar to that of Dynamo. Other replication policies are “Rack

Aware” and “Data Center Aware” which take into account the load balancing across

Data Storage for Social Networks: A Socially Aware Approach

Search WWH ::

Custom Search

Home