Performance Tuning - Professional NoSQL - page 309

Databases Reference

In-Depth Information

LEVERAGING BLOOM FILTERS

Bloom Filters were introduced in Chapter 13. Please review the defi nition if you aren't sure what

they are.

A get row call in HBase currently does a parallel N -way get of that row from all StoreFiles in a

region. This implies N reads requests from disk. Bloom Filters provide a lightweight in-memory

structure to reduce those N disk reads to only the fi les likely to contain that row.

Reads are in parallel and so the performance gains on an individual get is minimal. Also, read

performance is dominated by disk read latency. If you replace parallel get with serial get you would

see an impact of Bloom Filters on read latency.

Bloom Filters can be more heavyweight than your data. This is one big reason why they aren't

enabled by default.

SUMMARY

This chapter presented a few perspectives on tuning the performance of parallel MapReduce-

based processes. The MapReduce algorithm enables the processing of large amounts of data using

commodity hardware. Scaling MapReduce algorithms requires some clever confi guration. Optimal

confi guration of MapReduce tasks can tune performance.

The chapter presented a few generic performance-tuning tips but used Hadoop and the associated

set of tools for illustration.

Next Page

Professional NoSQL

Search WWH ::

Custom Search

Home