Database Reference
In-Depth Information
12 An Overview of
Large-Scale Stream
Processing Engines
Radwa Elshawi and Sherif Sakr
CONTENTS
12.1 Introduction .................................................................................................. 389
12.2 Aurora ........................................................................................................... 390
12.3 Borealis ......................................................................................................... 393
12.4 IBM System S and IBM Spade ..................................................................... 396
12.5 Deduce .......................................................................................................... 399
12.6 StreamCloud .................................................................................................400
12.7 Stormy ...........................................................................................................402
12.8 Twitter Storm ................................................................................................404
12.9 Conclusion ....................................................................................................407
References ..............................................................................................................407
12.1 INTRODUCTION
Today's era of Big Data is witnessing a continuous increase of user and machine
connectivity that produces an overwhelming flow of data that demands a paradigm
shift in the computing architecture requirements and large-scale data-processing
mechanisms. Therefore, concurrent computations have been receiving increased
attention due to the widespread adoption of multicore processors and the emerging
advancements of cloud computing technology. For example, the MapReduce frame-
work has been introduced as a scalable and fault-tolerant data-processing framework
that enables the processing of a massive volume of data in parallel on clusters of hori-
zontally scalable commodity machines. By virtue of its simplicity, scalability, and
fault-tolerance, MapReduce is becoming ubiquitous and gaining significant momen-
tum within both industry and academia. However, the MapReduce framework, open-
sourced by the Hadoop* Implementation, and its related large-scale data-processing
technologies (e.g., Pig, Hive ) have been mainly designed for supporting batch pro-
cessing tasks, but they are not adequate for supporting real-time stream processing
* http://hadoop.apache.org/.
http://pig.apache.org/.
http://hive.apache.org/.
389
 
Search WWH ::




Custom Search