Database Reference
In-Depth Information
express ad hoc queries. In contrast to other systems such as Pig [109] or Hive [123],
it executes queries natively without translating them into MapReduce jobs. In par-
ticular, Dremel is designed to execute many queries that would ordinarily require a
sequence of MapReduce jobs.
2.7 CONCLUSIONS
The database community has always been focusing on dealing with the challenges
of Big Data management, although the meaning of “ big ” has been evolving continu-
ously to represent different scales over the time [24]. According to IBM, we are cur-
rently creating 2.5 quintillion bytes of data, every day. This data comes from many
different sources and in different formats including digital pictures, videos, posts to
social media sites, intelligent sensors, purchase transaction records, and cell phone
GPS signals. This is a new scale of Big Data , which is attracting a huge interest from
both the industrial and research communities with the aim of creating the best means
to process and analyze this data to make the best use of it. In the last decade, the
MapReduce framework has emerged as a popular mechanism to harness the power
of large clusters of computers. It allows programmers to think in a data-centric fash-
ion where they can focus on applying transformations to sets of data records, while
the details of distributed execution and fault tolerance are transparently managed by
the MapReduce framework.
In this chapter, we presented a survey of the MapReduce family of approaches for
developing scalable data-processing systems and solutions. In general, we notice that
although the MapReduce framework and its open-source implementation of Hadoop
are now considered to be sufficiently mature such that they are widely used for devel-
oping many solutions by academia and industry in different application domains, we
believe that it is unlikely that MapReduce will completely replace database systems
even for data warehousing applications. We expect that they will always coexist and
complement each others in different scenarios. We are also convinced that there is
still room for further optimization and advancement in different directions on the
spectrum of the MapReduce framework that is required to bring forward the vision
of providing large-scale data analysis as a commodity for novice end-users. For
example, energy efficiency in the MapReduce is an important problem, which has
not attracted sufficient attention from the research community, yet. The traditional
challenge of debugging large-scale computations on distributed systems has not been
given sufficient consideration by the MapReduce research community. Related with
the issue of the power of expressiveness of the programming model, we feel that this
is an area that requires more investigation. We also noticed that the over simplicity
of the MapReduce programming model have raised some key challenges on dealing
with complex data models (e.g., nested models, XML and hierarchical model, RDF,
and graphs) efficiently. This limitation has called for the need of next generation of Big
Data architectures and systems that can provide the required scale and performance
attributes for these domain. For example, Google has created the Dremel system
[99], commercialized under the name of BigQuery ,* to support interactive analysis
* https://developers.google.com/bigquery/.
Search WWH ::




Custom Search