Dealing with Large Files - HBase Design Patterns

Database Reference

In-Depth Information

Here are the possible problems with storing somewhat large files in HBase (just to

warn you):

• HBase periodically compacts it's entire index on disk to provide efficient

random lookups against this index. The compaction of large files might be

very inefficient. In essence, they are already compacted, since many video

formats provide compression. Yet, your database will be rewriting them to

a different place, trying to squeeze out more performance, but in fact, just

moving files around.

• Committing a (large) blob to disk will be unnecessarily expensive. Firstly, the

entire blob will be written once to the commit log, and then it will be written

out again to the actual HBase file. Each of these writes will be replicated three

times by HDFS. Also, remember that you might want to write to another data

center too. Unnecessary duplication could thus cost a lot.

Practical recommendations

In your actual work, use the following multistep approach. Use the simple approach

if it works, and if not, go to a more complex one.

If the file is small, store it in HBase (make this threshold a parameter, as you might

want to experiment with it).

Store the file path in HBase, together with the description and the actual file, in HDFS.

Implement your own library that breaks files into small manageable pieces and

stores or retrieves them for you.

A practical lab

Let's walk through an application that gives you a practical case of large files—a

video site. Suppose you need to store your videos and describe them in an

appropriate HBase table, and you have already chosen to store the videos elsewhere,

not in your database. How will you design your table?

At this point, I will encourage you to close the topic and design your

table. Then, come back and compare your design with the one I give

you.

Search WWH ::

Custom Search

Home