Database Reference
In-Depth Information
The many possibilities for aggregation
Having satisfied ourselves that precomputed aggregation is worth adding some complexity
to the process of recording an observation, let's explore some other ways in which we
might want to aggregate data. Here are a few examples of questions we might want to an-
swer, and the type of table schema we might set up to efficiently store precomputed aggreg-
ates to answer these questions:
Partition
Key
Clustering
Column
Question
How many total views did status updates receive in September?
year
date
What percentage of status update views are via the web, via our mobile app, and via
third-party applications?
month
client_type
Which hours of the day are most popular for viewing status updates?
month
hour_of_day
What is the average monthly growth of status update views this year?
year
month
In each case, we choose a partition key that divides our observations into large chunks of
time that we'll most likely want to query within, rather than across; then we choose a clus-
tering column that aggregates data using the appropriate level of granularity.
The role of discrete observations
One question that might come to concern us is the purpose of the first table we created in
this chapter: status_update_views . As we observed, this table isn't terribly useful
for answering any aggregate questions about our usage data, so why store it at all?
As it turns out, we probably won't interact directly with status_update_views when
exploring our analytics observations on a day-to-day basis. However,
status_update_views stores the raw material for all of the aggregate tables. If, in a
few months, we decide that we'd like to aggregate data on a previously unforeseen dimen-
sion, we can backfill our aggregates using the raw observations in
status_update_views . Keeping the raw data around gives us some flexibility when
it comes to designing new aggregates down the road.
Search WWH ::




Custom Search