Database Reference
In-Depth Information
We will approach this chapter in the following manner:
• Discuss the keys for storing large files
• Discuss how and where to store large files
• What are the possible solutions?
• What are the advantages and disadvantages of each of these solutions?
• What are the performance parameters when the solution begins
to deteriorate?
• Who uses what?
Even though some companies' solutions might not be available as
open source code, we can always benefit by learning from them.
Let's discuss each point by proceeding in the order we just listed.
Storing files using keys
Each file that you want to store as an object in HBase needs to be stored using a key.
We will also retrieve it using this key. Where do you get these keys from?
We have already discussed why we should not ask the database to generate these
keys in Chapter 3 , Using HBase Tables for Single Entities . There, we said that a database
is distributed. So, if you have consistent key generation, it will become a bottleneck.
If you distribute your key generation, the keys will either be inconsistent or the
central management of the keys will slow down HBase. In Chapter 3 , Using HBase
Tables for Single Entities , we found a solution for the specific case under discussion—
since our usernames had to be unique, by the nature of the requirements, we simply
used the username as a unique key.
This is not ideal, however. Firstly, it is not generic, and secondly, it is out of our
control and might lead to a database imbalance (as we will discuss in Chapter 5 , Time
Series Data ). Therefore, now is a good time to introduce a general solution for the key
generation problem—the use of UUID .
Using UUID
A universally unique identifier ( UUID ) is an identifier standard that is used in
software construction, standardized by the Open Software Foundation ( OSF ) as
part of the distributed computing environment ( DCE ). On Microsoft Windows
platforms, it is known as a globally unique identifier ( GUID ).
 
Search WWH ::




Custom Search