Databases Reference
In-Depth Information
1.3
Apache Cassandra
Cassandra [ 21 ], initially developed at Facebook and later becoming an Apache
open source project, is a data store solution resembling a combination of Dynamo
and BigTable. Cassandra can be described as one data store that runs a BigTable
data model on a Dynamo-like server infrastructure. Cassandra is arguably the most
popular choice for implementing a large-scale distributed storage system today.
It has been used by Digg, Facebook, Twitter, Reddit, Rackspace, Cloudkick, and
Cisco, to name a few. Below, we discuss some key features of Cassandra.
1.3.1
Data Model
A “database”—the core concept of relational databases—is called a keyspace in
Cassandra. Analogous to a database containing a set of relations or tables, a
keyspace of Cassandra contains a set of column families . Like relational tables,
column families must be defined when the Cassandra keyspace is created and they
cannot be modified thereafter. Adding or removing column families requires reboot
of the keyspace. However, the fundamental difference between a relational table
and a column family is that while the former is composed of rows with the same
columns, different rows of a column family do not have to share the same columns.
A row can have any number of columns and these columns can vary from row to
row. For example, Fig. 1.2 shows a Cassandra column family with two rows, one
with three columns and one with two columns.
Fig. 1.2
Cassandra's table with column families (source: [ 16 ])
Search WWH ::




Custom Search