Database Reference
In-Depth Information
Figure 12.1 Data flow in the real-time insight platform
Data Model
Figure 12.2 is just a glimpse into the data model of many of our systems. We
have dozens of ColumnFamilys depending on the use case, application require-
ments, and query patterns. The ones shown in Figure 12.2 are less specific and
common among nearly all our use cases. As shown, the Raw Event Data Colum-
nFamily stores raw event data sorted by time. For this ColumnFamily, the column
key (column name) is the time when the event occurred. Since column names are
stored sorted in Cassandra, this enables physical sorting of time-series data on the
disk as it enters the system. This physical sorting (versus lazily sorting data upon
read) enables efficient range scans on time-series data. Note that the row key is a
combination of time and event type. We can't use just the hour as a row key as
it will likely create a hot spot, even when using a RandomPartitioner. Each kind
of rollup, per minute or per hour, is stored in separate counter ColumnFamilys. In
addition to these simple rollups, we also do complex filtering and aggregations by
multiple dimensions, all in real time.
Search WWH ::




Custom Search