Data Modeling Approaches for Big Data and Analytics Solutions - Big Data Imperatives

Databases Reference

In-Depth Information

Storing Values in Column Names

It's a common practice with a CFDB design to store a value (actual data) in the column

name (a.k.a. column key), and even to leave the column value field empty if there is

nothing else to store. One motivation for this practice is that column names are stored

physically sorted, but column values are not.

Notes

•

The maximum column key (and row key) size is 64KB. However,

don't store something like “item description” as the column key!

•

Don't use timestamp alone as a column key. You might get

colliding timestamps from two or more app servers writing to

CFDB. Prefer time-uuid instead.

•

The maximum column value size is 2 GB. But because there is no

streaming and the whole value is fetched in heap memory when

requested, limit the size to only a few MBs.

Leverage Wide Rows for Ordering, Grouping,

and Filtering

This goes along with the above practice. When actual data is stored in column names,

we end up with wide rows.

Benefits of wide rows

Since column names are stored physically sorted, wide rows

•

enable ordering of data and hence efficient filtering (range scans).

You'll still be able to efficiently look up an individual column

within a wide row, if needed.

•

If data is queried together, you can group that data up in a single

wide row that can be read back efficiently, as part of a single

query. As an example, for tracking or monitoring some time series

data, we can group data by hour/date/machines/event types

(depending on the requirements) in a single wide row, with each

column containing granular data or roll-ups.

•

Wide row column families are heavily used (with composite

columns) to build custom indexes in CFDB.

•

As a side benefit, you can de-normalize a one-to-many

relationship as a wide row without data duplication.

Search WWH ::

Custom Search

Home