Database Reference
In-Depth Information
As you can see, well logs record a certain measurement (in this case, it is sonic
waves) on the millisecond scale. So, let's imagine for the sake of our example that
we are recording the data every millisecond, and the HBase table contents look
similar to this:
Key (time,
milliseconds)
Family:Column name
Family:Column name
1402282760
Logs:Lith, Logs:SW, Logs:Res
Other: Depth, Other: acceleration
1402282761
Logs:Lith, Logs:SW, Logs:Res
Other: Depth, Other: acceleration
1402282762
etc.
etc.
Okay, so this looks great and clear. You can get the data at any millisecond in one read.
You can also perform a nice scan on the range of key values. Since all the rows that are
stored in HBase are already sorted by key, your scans are guaranteed to be fast.
However, there are two problems with this approach. They are as follows:
• The first problem is that it can result in the overloading (overheating) of some
of the region servers, because during writes, all the data is concentrated in
the regions they serve and no data is recorded in the other regions. Similarly,
during typical reads against the most recent data, we will be querying a small
number of regions.
• The second problem with this approach is that you are storing relatively
few columns in each row, and this might be very inefficient due to very little
information being read at one time and due to the presence of too many
bloom filter values.
Each of the potential problems can kill the performance of our application. It is,
therefore, important to understand both well.
Avoiding region hotspotting
This refers to our first problem, as monotonically increasing key values are bad. Why
is that?
This is very well explained by Ikai Lan, a Google engineer at the time when he wrote
this explanation, and he's currently working for Developer Relations at Google
NYC. Ikai was also an early inspiration for the doodles and cartoons used in this
topic and my other big data cartoon series, which can be found at http://shmsoft.
blogspot.com/search/label/Hadoop%20cartoons , so he deserves a special
acknowledgement.
Search WWH ::




Custom Search