Storing Streaming Data - Real-Time Analytics

Database Reference

In-Depth Information

Internally, Cassandra represents this table as a data structure known as a

column family. Despite appearing to contain a large number of rows, the

column family actually only contains a row for each customer_id and a

large number of columns for each metric and ts combination. This is

really only important to note because there is a limit on the number of

columns a given row can have: 2 billion columns or 2 gigabytes of storage.

These limitations can be reached quite easily in some time-series

implementations. To overcome them, Cassandra allows multiple keys to be

used as the row identifier. The disadvantage to doing this is that the row key

is also used to partition the data across the Cassandra cluster. This means

that all queries, inserts, or updates must contain all of the elements of the

row key.

Ifthequerywillalwaysincludethe customer_id andthe metric ,merging

the customer_id and metric fields would create rows identified by

customer_id:metric combinations with a column for each timestamp:

cqlsh:metrics> CREATE TABLE counts_composite (

customer_id INT,

metric TEXT,

ts TIMESTAMP,

value COUNTER,

value_2 COUNTER,

PRIMARY KEY ( (customer_id,metric) ,ts)

) WITH CLUSTERING ORDER BY (ts DESC);

Adding the CLUSTERING ORDER command tells Cassandra to sort each

of the columns in descending order instead of the natural order for a

timestamp column, which would be ascending.

Like most relational databases, you can alter tables after they've been

createdusingthe ALTER TABLE command.Themostcommonusecaseisto

add a column to an existing table or to remove an existing column. Adding a

new column does not cause any validation of existing rows.

Dropping a column will also eventually cause the deletion of the data

associated with that column, but this does not happen until a major

compaction occurs.

Search WWH ::

Custom Search

Home