Approximating Streaming Data with Sketching - Real-Time Analytics

Database Reference

In-Depth Information

the Reduce step. One such application is the so-called “attribution” process.

In this setting, users identified by a unique identifier engage in a number of

events before possibly engaging in an event of interest, called a “conversion”

in this context. The attribution process is interested in the events that

happened in some window before the final conversion event. Because most

users will not convert (low single-digit percentages of conversion are

normal), a Bloom Filter containing the IDs of the users who did convert can

be used in the Map step of an attribution Map-Reduce job. Even with a high

error of 10 percent, it still tends to reduce the amount of data going to the

reduce step by 80 percent to 90 percent.

Conclusion

One of the main challenges of processing streaming data is keeping up

with the number of events to be processed. Even with the advent of the

high-performance solid-state disk (SSD), this data must generally be stored

in main memory (RAM) to achieve acceptable performance. If the data to be

stored is simple, such as sums or averages, this does not present a problem.

When the data to be stored becomes more complicated, like the number of

unique values in the stream, this can present a problem. Attempting to store

the data directly can result in storage requirements that are proportional to

the size of the data stream and can quickly overrun the available RAM.

This chapter has presented a number of methods for storing certain values

such as sets and their size in such a way that the memory usage is controlled

by the application rather than the data, ensuring that RAM requirements

can be met. The downside of these techniques is that they introduce

estimation error into computed values. In some cases, this error may not be

tolerable, but the error is also a function of storage so it may be controlled

by the application into acceptable levels.

Search WWH ::

Custom Search

Home