Database Reference
In-Depth Information
13 Advanced Algorithms for
Efficient Approximate
Duplicate Detection
in Data Streams Using
Bloom Filters
Sourav Dutta and Ankur Narang
CONTENTS
13.1 Introduction and Motivation ......................................................................... 410
13.2 Duplicate Detection Approaches and Data Structures ................................. 412
13.2.1 Database Query and Buffering Techniques ...................................... 412
13.2.2 Bloom Filters .................................................................................... 413
13.2.3 Counting and Window Bloom Filters ............................................... 413
13.2.4 Parallel Bloom Filters ....................................................................... 413
13.3 Stable Bloom Filter (SBF)............................................................................. 414
13.3.1 Stability Property .............................................................................. 415
13.4 Reservoir Sampling Mechanism ................................................................... 416
13.5 Reservoir Sampling-Based Bloom Filter (RSBF) Approach ....................... 417
13.5.1 Theoretical Framework ..................................................................... 418
13.5.1.1 False-Positive Rate ............................................................. 419
13.5.1.2 False-Negative Rate ........................................................... 421
13.5.1.3 Stability Factor ................................................................... 423
13.6 Experiments and Results .............................................................................. 424
13.6.1 Setting of Parameters ........................................................................ 424
13.6.2 Results and Analysis ......................................................................... 425
13.6.3 Quality Comparison ......................................................................... 426
13.6.4 Detailed Analysis .............................................................................. 430
13.7 Conclusions ................................................................................................... 432
References .............................................................................................................. 432
409
 
Search WWH ::




Custom Search