Database Reference
In-Depth Information
Figure 3-3. Use of a wide table for NoSQL time series data. The key structure is illustrative; in real
applications, a binary format might be used, but the ordering properties would be the same.
Because both HBase and MapR-DB store data ordered by the primary key, the key design
shown in Figure 3-3 will cause rows containing data from a single time series to wind up
near one another on disk. This design means that retrieving data from a particular time series
for a time range will involve largely sequential disk operations and therefore will be much
faster than would be the case if the rows were widely scattered. In order to gain the perform-
ance benefits of this table structure, the number of samples in each time window should be
substantial enough to cause a significant decrease in the number of rows that need to be re-
trieved. Typically, the time window is adjusted so that 100-1,000 samples are in each row.
NoSQL Database with Hybrid Design
The table design shown in Figure 3-3 can be improved by collapsing all of the data for a row
into a single data structure known as a blob. This blob can be highly compressed so that less
data needs to be read from disk. Also, if HBase is used to store the time series, having a
single column per row decreases the per-column overhead incurred by the on-disk format
that HBase uses, which further increases performance. The hybrid-style table structure is
shown in Figure 3-4 , where some rows have been collapsed using blob structures and some
have not.
Search WWH ::




Custom Search