Database Reference
In-Depth Information
that although the MapReduce framework, and its open source implementation of
Hadoop, are now considered to be sufficiently mature such that they are widely used
for developing many solutions by academia and industry in different application
domains. We believe that it is unlikely that MapReduce will completely replace
database systems even for data warehousing applications. We expect that they will
always coexist and complement each others in different scenarios. We are also
convinced that there is still room for further optimization and advancement in
different directions on the spectrum of the MapReduce framework that is required
to bring forward the vision of providing large scale data analysis as a commodity
for novice end-users. For example, energy efficiency in the MapReduce is an
important problem which has not attracted sufficient attention from the research
community, yet. The traditional challenge of debugging large scale computations
on distributed system has not been given sufficient consideration by the MapReduce
research community. Related with the issue of the power of expressiveness of the
programming model, we feel that this is an area that requires more investigation.
We also noticed that the over simplicity of the MapReduce programming model
have raised some key challenges on dealing with complex data models (e.g., nested
models, XML and hierarchical model , RDF and graphs) efficiently. This limitation
has called for the need of next-generation of big data architectures and systems that
can provide the required scale and performance attributes for these domain. For
example, Google has created the Dremel system [ 182 , 183 ], commercialized under
the name of BigQuery [ 22 ], to support interactive analysis of nested data. Google
has also presented the Pregel system [ 180 ], open sourced by Apache Giraph and
Apache Hama projects, that uses a BSP-based programming model for efficient
and scalable processing of massive graphs on distributed cluster of commodity
machines. Recently, Twitter has announced the release of the Storm [ 47 ]systemas
a distributed and fault-tolerant platform for implementing continuous and realtime
processing applications of streamed data. We believe that more of these domain-
specific systems will be introduced in the future to form the new generation of big
data systems. Defining the right and most convenient programming abstractions
and declarative interfaces of these domain-specific Big Data systems is another
important research direction that will need to be deeply investigated.
Search WWH ::




Custom Search