Database Reference
In-Depth Information
Other pipelines create trends and user/video clustering data that is also
written to Cassandra.
Cassandra is also used for real-time stream access and other real-time user
management.
That all worked really well for quite a while. As we started to grow larger and
larger as a company, we had to scale various parts of the solution at different rates.
We would spin up single-use Cassandra clusters, and they would just hum along in
the background with little to no management (which we don't recommend). Even-
tually, we realized that we had to bring all our concerns back together and build a
modern analytics stack. We had success doing stream processing of our real-time
data in Storm and realized we could use a similar system to write our data into
Cassandra as it arrived.
For our next-generation analytics stack, we started by writing the over two billi-
on raw video events per day (over 25,000 per second) into a big Cassandra cluster.
They are written into a time-series wide row schema, which looks like what is
shown in Listing 12.1 .
Listing 12.1 Raw Event Data and Event Attribute Data
Click here to view code image
Event CF:
2013-08-26#++0BoO6: 2013-09-14T22:06:29.000Z:
{"eventType":1, "lastEvent-
Time":1379196388} 2013-09-14T22:06:29.001Z:
{"eventType":19,
"firstForPlayer":true}
EventAttr CF: 2013-08-26#++0BoO6: ipad-
dr: 174.89.195.19 region: ontario
videoId: 21856838 providerId: 25322 coun-
tryCode: ca device-type: Tablet
Storing user attributes into a separate ColumnFamily enables us to do easy fil-
tering/indexing, as well as makes it possible to do post-ingestion update of attrib-
utes. It also saves a lot of space.
From here, the challenge is how we turn the mountain of raw events into small,
actionable nuggets of truth. And how do we make the development and query-
 
Search WWH ::




Custom Search