Advanced Data Modeling - HBase Essentials

Database Reference

In-Depth Information

Consider another use case of processing streaming events, which is a classic example

of time series data. The source of streaming data could be any, for example, stock

exchange real-time feeds, data coming from a sensor, or data coming from the network

monitoring system for the production environment. While designing the table

structure for the time series data, we usually consider the event's time as a row key.

In HBase, rows are stored in regions by sorting them in distinct ranges using speciic

start and stop keys. The sequentially increasing time series data gets written to the

same region; this causes the issue of data being ingested onto a single region which

is hosted on a region server, leading to a hotspot. This distribution of data instantly

slows down the read/write performance of a cluster to the speed of a single server.

To solve this issue of data getting written to a single region server, an easy

solution can be to preix the row key with a nonsequential preix and to ensure the

distribution of data over all the region servers instead of just one. There are other

approaches as well:

• Salting : The salting preix can be used, along with a row key, to ensure that

the data is stored across all the region servers. For example, we can generate

a random salt number by taking the hash code of the timestamp and its

modulus with any number of region servers. The drawback of this approach

is that data reads are distributed across the region servers and need to be

handled in a client code for the get() or scan() operation. An example of

salting is shown in the following code:

int saltNumber = new Long(new Long(timestamp).hashCode()) %

byte[] rowkey = Bytes.add(Bytes.toBytes(saltNumber), Bytes.

toBytes(timestamp);

• Hashing : This approach is not suited for time series data, as by performing

hashing on the timestamp, the certainty of losing the consecutive values

arises and reading the data between the time ranges would not be possible.

HBase does not provide direct support for secondary indexes, but there are many

use cases that require secondary indexes such as:

• A cell lookup using coordinates other than the row key, column family name,

and qualiier

• Scanning a range of rows from the table ordered by the secondary index

Search WWH ::

Custom Search

Home