Database Reference
In-Depth Information
The lesson that we learn here is to use composite keys, take care of load balancing by
adding the UID at the beginning, put the timestamp in the key, and add additional
information at the end. Note that, ordering is done inside the composite key, thus
reflecting the types of queries we anticipate.
The timestamp
The timestamp is a Unix epoch value in seconds, encoded on 4 bytes. Rows are
broken up into hour increments, reflected by the timestamp in each row. Thus, each
timestamp will be normalized to an hour value, for example, 2013-01-01 08:00:00 .
This is to avoid stuffing too many data points in a single row as that would affect
region distribution. However, note that it can result in a large number of data points
if the frequency of data generation is high.
Also, since HBase sorts the data on the row key, the data for the same metric and
time bucket, but with different tags, will be grouped together for efficient queries.
This assumes that the number of tags is small, and indeed OpenTSDB limits it to
eight tags.
When storing time series data, implement the following best practices:
• Store a reasonable time interval per row. The amount of data should not
make the table too tall and thin or too narrow and wide. One hour was
chosen here.
• Use tags to store the time interval designation.
• Use your own data encoding, since we deal with binary data here.
• Take advantage of the natural sorting of columns in the row.
• Design for efficient access.
Compactions
Why is compaction required? The answer is to reduce the storage (as the key is
repeatedly stored for each column). If compactions have been enabled for a TSD,
a row might be compacted after its base hour has passed or a query has run over
the row. The lesson here is that in your design, keep the compactions, both minor
and major, in mind, because they will affect the performance.
 
Search WWH ::




Custom Search