Databases Reference
In-Depth Information
16
Performance Tuning
WHAT'S IN THIS CHAPTER?
Understanding the factors that aff ect parallel scalable applications
Optimizing scalable processing, especially when it leverages the
MapReduce model for processing
Presenting a set of best practices for parallel processing
Illustrating a few Hadoop performance tuning tips
Today, much of the big data analysis in the world of NoSQL rests on the shoulders of the
MapReduce model of processing. Hadoop is built on it and each NoSQL product supporting
huge data sizes leverages it. This chapter is a fi rst look into optimizing scalable applications
and tuning the way MapReduce-style processing works on large data sets. By no means does
the chapter provide a prescriptive solution. Instead, it provides a few important concepts
and good practices to bear in mind when optimizing a scalable parallel application. Each
optimization problem is unique to its requirements and context and so providing one
universally applicable general solution is probably not feasible.
GOALS OF PARALLEL ALGORITHMS
MapReduce makes scalable parallel processing easier than it had been in the past. By adhering
to a model where data is not shared between parallel threads or processes, MapReduce creates
a bottleneck-free way of scaling out as workloads increase. The underlying goal at all times is
to reduce latency and increase throughput.
The Implications of Reducing Latency
Reducing latency simply means reducing the execution time of a program. The faster a program
completes — that is, the less time a program takes to produce desired results for a given set of
Search WWH ::




Custom Search