Dealing with Large Files - HBase Design Patterns

Database Reference

In-Depth Information

Facebook's Haystack for the storage of

large files

Facebook's Haystack is described at http://www.facebook.com/note.php?note_

id=76191543919 . The Haystack paper describes all the design considerations and

the implementations that you will have to perform in order to store large files in your

system. You need to think of the read/write loads and of the hardware on which you

will run it. Most likely, you don't need all the bells and whistles of Haystack and can

implement a simpler solution. Alas! Haystack is not available as open source.

Twitter solution to store large files

Twitter's solution for this problem is published at https://blog.twitter.

Twitter needs to store pictures and photographs. Today, you can hardly expect

people to read anything unless it is supplied with a picture. Take my own blog,

http://mkerzner.blogspot.com/ , for example. If while explaining the most

complex law one can also enjoy the world's best art, it becomes fun and educational

at the same time. So, Twitter needs to store millions of photos, and they have had to

design their own solution.

As always, each such solution answers a specific set of design goals, and the lesson

we learn here is to define our design goals, understanding that they might well be

different from anybody else's. Here are the advantages of using Twitter's solution:

• Low cost : This reduces the amount of money and time spent to store

large files

• High performance : This serves images in the low tens of milliseconds, while

maintaining a throughput of hundreds and thousands of requests per second

• Easy to operate : This lets you scale the operational overhead with

continuously growing infrastructure

Accordingly, Twitter created their own solution with everything in it, that is,

redundant storage, multiserver communication through ZooKeeper ( http://

zookeeper.apache.org/ ) , and fast serving of the images.

Search WWH ::

Custom Search

Home