Database Reference
In-Depth Information
Here are the possible problems with storing somewhat large files in HBase (just to
warn you):
• HBase periodically compacts it's entire index on disk to provide efficient
random lookups against this index. The compaction of large files might be
very inefficient. In essence, they are already compacted, since many video
formats provide compression. Yet, your database will be rewriting them to
a different place, trying to squeeze out more performance, but in fact, just
moving files around.
• Committing a (large) blob to disk will be unnecessarily expensive. Firstly, the
entire blob will be written once to the commit log, and then it will be written
out again to the actual HBase file. Each of these writes will be replicated three
times by HDFS. Also, remember that you might want to write to another data
center too. Unnecessary duplication could thus cost a lot.
Practical recommendations
In your actual work, use the following multistep approach. Use the simple approach
if it works, and if not, go to a more complex one.
If the file is small, store it in HBase (make this threshold a parameter, as you might
want to experiment with it).
Store the file path in HBase, together with the description and the actual file, in HDFS.
Implement your own library that breaks files into small manageable pieces and
stores or retrieves them for you.
A practical lab
Let's walk through an application that gives you a practical case of large files—a
video site. Suppose you need to store your videos and describe them in an
appropriate HBase table, and you have already chosen to store the videos elsewhere,
not in your database. How will you design your table?
At this point, I will encourage you to close the topic and design your
table. Then, come back and compare your design with the one I give
you.
 
Search WWH ::




Custom Search