Databases Reference
In-Depth Information
The chapter demonstrates a set of MapReduce cases and shows how complications of very large
data sets can be carried out with elegance. No low-level API manipulation is necessary and no
worries about resource deadlocks or starvation occur. In addition, keeping data and compute
together reduces the effect of I/O and bandwidth limitations.
SUMMARY
MapReduce is a powerful way to process a lot of information in a fast and effi cient manner.
Google has used it for a lot of its heavy lifting. Google has also been gracious enough to share the
underlying ideas with the research and the developer community. In addition to that, the Hadoop
team has built out a very robust and scalable open-source infrastructure to leverage the processing
model. Other NoSQL projects and vendors have also adopted MapReduce.
MapReduce is replacing SQL in all highly scalable and distributed models that work with immense
amounts of data. Its performance and “shared nothing” model proves to be a big winner over the
traditional SQL model.
Writing MapReduce programs is also relatively easy because the infrastructure handles the
complexity and lets a developer focus on chains of MapReduce jobs and the application of them
to processing large amounts of data. Frequently, common MapReduce jobs can be handled with
a common infrastructure such as CouchDB built-in reducers or projects such as Apache Mahout.
However, sometimes defi ning keys and working through the reduce logic could need careful attention.
Search WWH ::




Custom Search