Database Reference
In-Depth Information
Using discrete analytics observations
The status_update_views table gives us a complete and highly discrete view of the
usage data we're collecting but it is limited in what questions it can answer. Of course, by
analyzing the primary key structure, we know that the question it is best suited to answer
is: "What do we know about each view of a given status update in a given time range?" We
can answer this question using a range slice query of the following form:
SELECT "observed_at", "client_type"
FROM "status_update_views"
WHERE "status_update_username" = 'alice'
AND "status_update_id" =
76e7a4d0-e796-11e3-90ce-5f98e903bf02
AND "observed_at" >= MINTIMEUUID('2014-10-05
00:00:00+0000')
AND "observed_at" < MINTIMEUUID('2014-10-06
00:00:00+0000');
This query will give us information about all the observed status update views on October
5, 2014. We know that, because we are asking for rows in a single partition and a specified
range of clustering columns, Cassandra can perform the query very efficiently.
What about other questions we'd like to ask regarding the user behavior we've observed?
One thing we can do is simply ask how many total status updates have been observed in a
given time range:
SELECT COUNT(1)
FROM "status_update_views"
WHERE "status_update_username" = 'alice'
AND "status_update_id" =
76e7a4d0-e796-11e3-90ce-5f98e903bf02
AND "observed_at" >= MINTIMEUUID('2014-10-05
00:00:00+0000')
AND "observed_at" < MINTIMEUUID('2014-10-06
00:00:00+0000');
Compared with retrieving the data in each row and then counting the number of rows we
get back, we do save some data transfer using a COUNT query. However, under the hood,
Search WWH ::




Custom Search