An Overview of Large-Scale Stream Processing Engines - Large Scale and Big Data: Processing and Management

Database Reference

In-Depth Information

12 An Overview of

Large-Scale Stream

Processing Engines

Radwa Elshawi and Sherif Sakr

CONTENTS

12.1 Introduction .................................................................................................. 389

12.2 Aurora ........................................................................................................... 390

12.3 Borealis ......................................................................................................... 393

12.4 IBM System S and IBM Spade ..................................................................... 396

12.5 Deduce .......................................................................................................... 399

12.6 StreamCloud .................................................................................................400

12.7 Stormy ...........................................................................................................402

12.8 Twitter Storm ................................................................................................404

12.9 Conclusion ....................................................................................................407

References ..............................................................................................................407

12.1 INTRODUCTION

Today's era of Big Data is witnessing a continuous increase of user and machine

connectivity that produces an overwhelming flow of data that demands a paradigm

shift in the computing architecture requirements and large-scale data-processing

mechanisms. Therefore, concurrent computations have been receiving increased

attention due to the widespread adoption of multicore processors and the emerging

advancements of cloud computing technology. For example, the MapReduce frame-

work has been introduced as a scalable and fault-tolerant data-processing framework

that enables the processing of a massive volume of data in parallel on clusters of hori-

zontally scalable commodity machines. By virtue of its simplicity, scalability, and

fault-tolerance, MapReduce is becoming ubiquitous and gaining significant momen-

tum within both industry and academia. However, the MapReduce framework, open-

sourced by the Hadoop* Implementation, and its related large-scale data-processing

technologies (e.g., Pig, † Hive ‡ ) have been mainly designed for supporting batch pro-

cessing tasks, but they are not adequate for supporting real-time stream processing

* http://hadoop.apache.org/.

† http://pig.apache.org/.

‡ http://hive.apache.org/.

389

Search WWH ::

Custom Search

Home