Database Reference
In-Depth Information
Internally, Cassandra represents this table as a data structure known as a
column family. Despite appearing to contain a large number of rows, the
column family actually only contains a row for each customer_id and a
large number of columns for each metric and ts combination. This is
really only important to note because there is a limit on the number of
columns a given row can have: 2 billion columns or 2 gigabytes of storage.
These limitations can be reached quite easily in some time-series
implementations. To overcome them, Cassandra allows multiple keys to be
used as the row identifier. The disadvantage to doing this is that the row key
is also used to partition the data across the Cassandra cluster. This means
that all queries, inserts, or updates must contain all of the elements of the
row key.
Ifthequerywillalwaysincludethe customer_id andthe metric ,merging
the customer_id and metric fields would create rows identified by
customer_id:metric combinations with a column for each timestamp:
cqlsh:metrics> CREATE TABLE counts_composite (
customer_id INT,
metric TEXT,
ts TIMESTAMP,
value COUNTER,
value_2 COUNTER,
PRIMARY KEY ( (customer_id,metric) ,ts)
) WITH CLUSTERING ORDER BY (ts DESC);
Adding the CLUSTERING ORDER command tells Cassandra to sort each
of the columns in descending order instead of the natural order for a
timestamp column, which would be ascending.
Like most relational databases, you can alter tables after they've been
createdusingthe ALTER TABLE command.Themostcommonusecaseisto
add a column to an existing table or to remove an existing column. Adding a
new column does not cause any validation of existing rows.
Dropping a column will also eventually cause the deletion of the data
associated with that column, but this does not happen until a major
compaction occurs.
Search WWH ::




Custom Search