The Cassandra Data Model - Cassandra: The Definitive Guide

Database Reference

In-Depth Information

in a regular column family is a row key pointing to a column name pointing to a value, while

the address of a value in a column family of type “super” is a row key pointing to a column

name pointing to a subcolumn name pointing to a value. Put slightly differently, a row in a super

column family still contains columns, each of which then contains subcolumns.

So that's the bottom-up approach to looking at Cassandra's data model. Now that we have this

basic understanding, let's switch gears and zoom out to a higher level, in order to take a top-

down approach. There is so much confusion on this topic that it's worth it to restate things in a

different way in order to thoroughly understand the data model.

Clusters

Cassandra is probably not the best choice if you only need to run a single node. As previously

mentioned, the Cassandra database is specifically designed to be distributed over several ma-

chines operating together that appear as a single instance to the end user. So the outermost struc-

ture in Cassandra is the cluster, sometimes called the ring, because Cassandra assigns data to

nodes in the cluster by arranging them in a ring.

A node holds a replica for different ranges of data. If the first node goes down, a replica can re-

spond to queries. The peer-to-peer protocol allows the data to replicate across nodes in a manner

transparent to the user, and the replicationfactoris the number of machines in your cluster that

will receive copies of the same data. We'll examine this in greater detail in Chapter 6 .

Keyspaces

A cluster is a container for keyspaces—typically a single keyspace. A keyspaceis the outermost

container for data in Cassandra, corresponding closely to a relational database. Like a relational

database, a keyspace has a name and a set of attributes that define keyspace-wide behavior. Al-

though people frequently advise that it's a good idea to create a single keyspace per application,

this doesn't appear to have much practical basis. It's certainly an acceptable practice, but it's per-

fectly fine to create as many keyspaces as your application needs. Note, however, that you will

probably run into trouble creating thousands of keyspaces per application.

Depending on your security constraints and partitioner, it's fine to run multiple keyspaces on the

same cluster. For example, if your application is called Twitter, you would probably have a cluster

called Twitter-Cluster and a keyspace called Twitter . To my knowledge, there are currently

no naming conventions in Cassandra for such items.

In Cassandra, the basic attributes that you can set per keyspace are:

Search WWH ::

Custom Search

Home