Database Reference
In-Depth Information
in a regular column family is a row key pointing to a column name pointing to a value, while
the address of a value in a column family of type “super” is a row key pointing to a column
name pointing to a subcolumn name pointing to a value. Put slightly differently, a row in a super
column family still contains columns, each of which then contains subcolumns.
So that's the bottom-up approach to looking at Cassandra's data model. Now that we have this
basic understanding, let's switch gears and zoom out to a higher level, in order to take a top-
down approach. There is so much confusion on this topic that it's worth it to restate things in a
different way in order to thoroughly understand the data model.
Clusters
Cassandra is probably not the best choice if you only need to run a single node. As previously
mentioned, the Cassandra database is specifically designed to be distributed over several ma-
chines operating together that appear as a single instance to the end user. So the outermost struc-
ture in Cassandra is the cluster, sometimes called the ring, because Cassandra assigns data to
nodes in the cluster by arranging them in a ring.
A node holds a replica for different ranges of data. If the first node goes down, a replica can re-
spond to queries. The peer-to-peer protocol allows the data to replicate across nodes in a manner
transparent to the user, and the replicationfactoris the number of machines in your cluster that
will receive copies of the same data. We'll examine this in greater detail in Chapter 6 .
Keyspaces
A cluster is a container for keyspaces—typically a single keyspace. A keyspaceis the outermost
container for data in Cassandra, corresponding closely to a relational database. Like a relational
database, a keyspace has a name and a set of attributes that define keyspace-wide behavior. Al-
though people frequently advise that it's a good idea to create a single keyspace per application,
this doesn't appear to have much practical basis. It's certainly an acceptable practice, but it's per-
fectly fine to create as many keyspaces as your application needs. Note, however, that you will
probably run into trouble creating thousands of keyspaces per application.
Depending on your security constraints and partitioner, it's fine to run multiple keyspaces on the
same cluster. For example, if your application is called Twitter, you would probably have a cluster
called Twitter-Cluster and a keyspace called Twitter . To my knowledge, there are currently
no naming conventions in Cassandra for such items.
In Cassandra, the basic attributes that you can set per keyspace are:
Search WWH ::




Custom Search