Database Reference
In-Depth Information
Summary
In this chapter, we explored strategies for aggregating observed time-series data—in this
case user behavior in viewing status updates in our application. While user behavior analyt-
ics are a fantastic and common use case for Cassandra, we could also take the same ap-
proach to aggregate scientific data, economic data, or anything else where we'd like to roll
up discrete observations into high-level aggregate values.
Our structure for recording time-series data used a table containing discrete observations as
the raw material and acting as the data record in case we want to introduce new aggregate
dimensions down the line. We also used a table that precomputed aggregate observations
by day; by keeping the aggregate up-to-date at write time, we built a structure that allows
us to very efficiently retrieve aggregates over a given time period, without any expensive
computation at read time. We can easily imagine constructing dozens of such tables, one
for each level of granularity at which we would like to analyze aggregate information.
We explored using counter columns to effortlessly maintain the precomputed aggregates;
each time we made an observation, we simply issued an upsert to increment the relevant
counter columns; this allowed us to record observations simply by issuing a series of
UPDATE statements, without having to read the current aggregate values from Cassandra
first.
We saw that, while counter columns are a useful tool for precomputed data aggregation,
they also have their downsides. Counter columns do not allow us to directly set values, we
can only increment or decrement them; because deletion of a counter column value is per-
manent, this operation is of little use in a counter column table. We saw that counter
columns can coexist in a table only with other counter columns; they can't be in the same
table as other data columns or collection columns.
In the next chapter, we will look more deeply into how Cassandra stores and retrieves data,
with particular focus on how data is distributed among multiple machines in a multinode
cluster, the typical configuration of a production Cassandra deployment. You'll learn how
Cassandra handles conflicting updates to the same piece of data using timestamps, and
you'll see how we can override those timestamps to interesting effect. You'll also learn
more about what happens when data is deleted from Cassandra, and use that knowledge to
avoid common pitfalls with data deletion.
Search WWH ::




Custom Search