Introduction - Data Storage for Social Networks: A Socially Aware Approach

Databases Reference

In-Depth Information

1.3

Apache Cassandra

Cassandra [ 21 ], initially developed at Facebook and later becoming an Apache

open source project, is a data store solution resembling a combination of Dynamo

and BigTable. Cassandra can be described as one data store that runs a BigTable

data model on a Dynamo-like server infrastructure. Cassandra is arguably the most

popular choice for implementing a large-scale distributed storage system today.

It has been used by Digg, Facebook, Twitter, Reddit, Rackspace, Cloudkick, and

Cisco, to name a few. Below, we discuss some key features of Cassandra.

1.3.1

Data Model

A “database”—the core concept of relational databases—is called a keyspace in

Cassandra. Analogous to a database containing a set of relations or tables, a

keyspace of Cassandra contains a set of column families . Like relational tables,

column families must be defined when the Cassandra keyspace is created and they

cannot be modified thereafter. Adding or removing column families requires reboot

of the keyspace. However, the fundamental difference between a relational table

and a column family is that while the former is composed of rows with the same

columns, different rows of a column family do not have to share the same columns.

A row can have any number of columns and these columns can vary from row to

row. For example, Fig. 1.2 shows a Cassandra column family with two rows, one

with three columns and one with two columns.

Fig. 1.2

Cassandra's table with column families (source: [ 16 ])

Search WWH ::

Custom Search

Home