Database Reference
In-Depth Information
Chapter 9
Big Data Processing Systems
In the last two decades, the continuous increase of computational power has
produced an overwhelming flow of data which has called for a paradigm shift in the
computing architecture and large scale data processing mechanisms. MapReduce
is a simple and powerful programming model that enables easy development of
scalable parallel applications to process vast amounts of data on large clusters
of commodity machines. It isolates the application from the details of running
a distributed program such as issues on data distribution, scheduling and fault
tolerance. However, the original implementation of the MapReduce framework had
some limitations that have been tackled by many research efforts in several followup
works after its introduction. This chapter provides a comprehensive survey for a
family of approaches and mechanisms of large scale data processing mechanisms
that have been implemented based on the original idea of the MapReduce framework
and are currently gaining a lot of momentum in both research and industrial
communities. We also cover a set of systems that have been implemented to provide
declarative programming interfaces on top of the MapReduce framework. In addi-
tion, we review several large scale data processing systems that resemble some of the
ideas of the MapReduce framework for different purposes and application scenarios.
Finally, we discuss some of the future research directions for implementing the next
generation of MapReduce-like solutions.
9.1
Introduction
Many enterprises continuously collect large datasets that record customer interac-
tions, product sales, results from advertising campaigns on the Web, and other types
of information. For example, Facebook collects 15 TeraBytes of data each day into
a PetaByte-scale data warehouse [ 222 ]. In general, the growing demand for large-
scale data processing and data analysis applications has spurred the development
of novel solutions from both the industry (e.g., web-data analysis, click-stream
analysis, network-monitoring log analysis) and the sciences (e.g., analysis of data
Search WWH ::




Custom Search