Database Reference
In-Depth Information
One obvious benefit of having such a flexible data storage mechanism is that you can
have arbitrary number of cells with customized names and have a partition key store data
as a list of tuples (a tuple is an ordered set; in this case, the tuple is a key-value pair). This
comes handy when you have to store things such as time series, for example, if you want
to use Cassandra to store your Facebook timeline or your Twitter feed or you want the
partition key to be a sensor ID and each cell to represent a tuple with name as the
timestamp when the data was created and value as the data sent by the sensor. Also, in a
partition, cells are by default naturally ordered by the cell's name. So, in our sensor case,
you will get data sorted for free. The other difference is, unlike RDBMS, Cassandra does
not have relations. This means relational logic will be needed to be handled at the applica-
tion level. This also means that we may want to denormalize the database because there is
no join and to avoid looking up multiple tables by running multiple queries. Denormaliza-
tion is a process of adding redundancy in data to achieve high read performance. For more
information, visit http://en.wikipedia.org/wiki/Denormalization .
Partitions are distributed across the cluster, creating effective auto-sharding. Each server
holds a range(s) of keys. So, if balanced, a cluster with more nodes will have less rows per
node. All these concepts will be repeated in detail in the later chapters.
Note
Types of keys
In the context of Cassandra, you may find the concept of keys a bit confusing. There are
five terms that you may encounter. Here is what they generally mean:
Primary key : This is the column or a group of columns that uniquely defines a
row of the CQL table.
Composite key : This is a type of primary key that is made up of more than one
column. Sometimes, the composite key is also referred to as the compound key.
Partition key : Cassandra's internal data representation is large rows with a
unique key called row key. It uses these row key values to distribute data across
cluster nodes. Since these row keys are used to partition data, they as called parti-
tion keys. When you define a table with a simple key, that key is the partition key.
If you define a table with a composite key, the first term of that composite key
works as the partition key. This means all the CQL rows with the same partition
key lives on one machine.
Clustering key : This is the column that tells Cassandra how the data within a
partition is ordered (or clustered). This essentially provides presorted retrieval if
you know what order you want your data to be retrieve in.
Search WWH ::




Custom Search