Database Reference
In-Depth Information
Consider another use case of processing streaming events, which is a classic example
of time series data. The source of streaming data could be any, for example, stock
exchange real-time feeds, data coming from a sensor, or data coming from the network
monitoring system for the production environment. While designing the table
structure for the time series data, we usually consider the event's time as a row key.
In HBase, rows are stored in regions by sorting them in distinct ranges using speciic
start and stop keys. The sequentially increasing time series data gets written to the
same region; this causes the issue of data being ingested onto a single region which
is hosted on a region server, leading to a hotspot. This distribution of data instantly
slows down the read/write performance of a cluster to the speed of a single server.
To solve this issue of data getting written to a single region server, an easy
solution can be to preix the row key with a nonsequential preix and to ensure the
distribution of data over all the region servers instead of just one. There are other
approaches as well:
Salting : The salting preix can be used, along with a row key, to ensure that
the data is stored across all the region servers. For example, we can generate
a random salt number by taking the hash code of the timestamp and its
modulus with any number of region servers. The drawback of this approach
is that data reads are distributed across the region servers and need to be
handled in a client code for the get() or scan() operation. An example of
salting is shown in the following code:
int saltNumber = new Long(new Long(timestamp).hashCode()) %
<number of region servers>
byte[] rowkey = Bytes.add(Bytes.toBytes(saltNumber), Bytes.
toBytes(timestamp);
Hashing : This approach is not suited for time series data, as by performing
hashing on the timestamp, the certainty of losing the consecutive values
arises and reading the data between the time ranges would not be possible.
HBase does not provide direct support for secondary indexes, but there are many
use cases that require secondary indexes such as:
• A cell lookup using coordinates other than the row key, column family name,
and qualiier
• Scanning a range of rows from the table ordered by the secondary index
 
Search WWH ::




Custom Search