Scalable Parallel Processing with MapReduce - Professional NoSQL - page 232

Databases Reference

In-Depth Information

The chapter demonstrates a set of MapReduce cases and shows how complications of very large

data sets can be carried out with elegance. No low-level API manipulation is necessary and no

worries about resource deadlocks or starvation occur. In addition, keeping data and compute

together reduces the effect of I/O and bandwidth limitations.

SUMMARY

MapReduce is a powerful way to process a lot of information in a fast and effi cient manner.

Google has used it for a lot of its heavy lifting. Google has also been gracious enough to share the

underlying ideas with the research and the developer community. In addition to that, the Hadoop

team has built out a very robust and scalable open-source infrastructure to leverage the processing

model. Other NoSQL projects and vendors have also adopted MapReduce.

MapReduce is replacing SQL in all highly scalable and distributed models that work with immense

amounts of data. Its performance and “shared nothing” model proves to be a big winner over the

traditional SQL model.

Writing MapReduce programs is also relatively easy because the infrastructure handles the

complexity and lets a developer focus on chains of MapReduce jobs and the application of them

to processing large amounts of data. Frequently, common MapReduce jobs can be handled with

a common infrastructure such as CouchDB built-in reducers or projects such as Apache Mahout.

However, sometimes defi ning keys and working through the reduce logic could need careful attention.

Next Page

Professional NoSQL

Search WWH ::

Custom Search

Home