An Overview of Large-Scale Stream Processing Engines - Large Scale and Big Data: Processing and Management

Database Reference

In-Depth Information

12.9 CONCLUSION

In this chapter, we presented an overview of a set of approaches and systems that

have presented for developing scalable stream data-processing systems and solu-

tions. Although we have been focusing on the main research and open-source proj-

ects in this domain, we also acknowledge the existence of other commercial systems

and technologies such as Microsoft StreamInsight* and StreamBase. † In general, we

notice that although the domain of designing distributed stream processing engine

has attracted the attention of the research community in the last few years, we are

convinced that there is still room for further optimization and advancement in dif-

ferent directions. For example, defining the right and most convenient programming

abstractions and standard declarative interfaces of these systems is an important

research direction that will need to be tackled. Designing innovative frameworks and

mechanisms that can combine the capabilities of large-scale distributed batch pro-

cessing systems (e.g., MapReduce) with the strengths of distributed stream process-

ing engine represents a clear gap in the area of advanced data-processing techniques

of Big Data that has yet to attract sufficient attention from the research community.

REFERENCES

1. Daniel J. Abadi, Yanif Ahmad, Magdalena Balazinska, Ugur Çetintemel, Mitch

Cherniack, Jeong-Hyon Hwang, Wolfgang Lindner, et al. Design of the Borealis Stream

Processing Engine. In CIDR , pages 277-289, 2005.

2. Daniel J. Abadi, Donald Carney, Ugur Çetintemel, Mitch Cherniack, Christian Convey,

Sangdon Lee, Michael Stonebraker, Nesime Tatbul, and Stanley B. Zdonik. Aurora: A

new model and architecture for data stream management. VLDB J ., 12(2):120-139,

2003.

3. Henrique Andrade, Bugra Gedik, Kun-Lung Wu, and Philip S. Yu. Scale-Up Strategies

for Processing High-Rate Data Streams in System S. In ICDE , pages 1375-1378, 2009.

4. Henrique Andrade, Bugra Gedik, Kun-Lung Wu, and Philip S. Yu. Processing high data

rate streams in System S. J. Parallel Distrib. Comput ., 71(2):145-156, 2011.

5. Hari Balakrishnan, M. Frans Kaashoek, David R. Karger, Robert Morris, and Ion Stoica.

Looking up data in p2p systems. Commun. ACM , 46(2):43-48, 2003.

6. Magdalena Balazinska, Hari Balakrishnan, Samuel Madden, and Michael Stonebraker.

Fault-tolerance in the borealis distributed stream processing system. ACM Trans.

Database Syst ., 33(1), 2008.

7. Mitch Cherniack, Hari Balakrishnan, Magdalena Balazinska, Donald Carney, Ugur

Çetintemel, Ying Xing, and Stanley B. Zdonik. Scalable Distributed Stream Processing.

In CIDR , 2003.

8. Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified data processing on large

clusters. In OSDI , pages 137-150, 2004.

9. Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati,

Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, and

Werner Vogels. Dynamo: Amazon's highly available key-value store. In SOSP, pages

205-220, 2007.

* http://msdn.microsoft.com/en-us/sqlserver/ee476990.aspx.

† http://www.streambase.com/.

Large Scale and Big Data: Processing and Management

Search WWH ::

Custom Search

Home