Aggregating Time-Series Data - Learning Apache Cassandra

Database Reference

In-Depth Information

Using discrete analytics observations

The status_update_views table gives us a complete and highly discrete view of the

usage data we're collecting but it is limited in what questions it can answer. Of course, by

analyzing the primary key structure, we know that the question it is best suited to answer

is: "What do we know about each view of a given status update in a given time range?" We

can answer this question using a range slice query of the following form:

SELECT "observed_at", "client_type"

FROM "status_update_views"

WHERE "status_update_username" = 'alice'

AND "status_update_id" =

76e7a4d0-e796-11e3-90ce-5f98e903bf02

AND "observed_at" >= MINTIMEUUID('2014-10-05

00:00:00+0000')

AND "observed_at" < MINTIMEUUID('2014-10-06

00:00:00+0000');

This query will give us information about all the observed status update views on October

5, 2014. We know that, because we are asking for rows in a single partition and a specified

range of clustering columns, Cassandra can perform the query very efficiently.

What about other questions we'd like to ask regarding the user behavior we've observed?

One thing we can do is simply ask how many total status updates have been observed in a

given time range:

SELECT COUNT(1)

FROM "status_update_views"

WHERE "status_update_username" = 'alice'

AND "status_update_id" =

76e7a4d0-e796-11e3-90ce-5f98e903bf02

AND "observed_at" >= MINTIMEUUID('2014-10-05

00:00:00+0000')

AND "observed_at" < MINTIMEUUID('2014-10-06

00:00:00+0000');

Compared with retrieving the data in each row and then counting the number of rows we

get back, we do save some data transfer using a COUNT query. However, under the hood,

Search WWH ::

Custom Search

Home