Databases Reference
In-Depth Information
Storing Values in Column Names
It's a common practice with a CFDB design to store a value (actual data) in the column
name (a.k.a. column key), and even to leave the column value field empty if there is
nothing else to store. One motivation for this practice is that column names are stored
physically sorted, but column values are not.
Notes
The maximum column key (and row key) size is 64KB. However,
don't store something like “item description” as the column key!
Don't use timestamp alone as a column key. You might get
colliding timestamps from two or more app servers writing to
CFDB. Prefer time-uuid instead.
The maximum column value size is 2 GB. But because there is no
streaming and the whole value is fetched in heap memory when
requested, limit the size to only a few MBs.
Leverage Wide Rows for Ordering, Grouping,
and Filtering
This goes along with the above practice. When actual data is stored in column names,
we end up with wide rows.
Benefits of wide rows
Since column names are stored physically sorted, wide rows
enable ordering of data and hence efficient filtering (range scans).
You'll still be able to efficiently look up an individual column
within a wide row, if needed.
If data is queried together, you can group that data up in a single
wide row that can be read back efficiently, as part of a single
query. As an example, for tracking or monitoring some time series
data, we can group data by hour/date/machines/event types
(depending on the requirements) in a single wide row, with each
column containing granular data or roll-ups.
Wide row column families are heavily used (with composite
columns) to build custom indexes in CFDB.
As a side benefit, you can de-normalize a one-to-many
relationship as a wide row without data duplication.
 
Search WWH ::




Custom Search