Database Reference
In-Depth Information
Chapter 9
Statistical Approximation of Streaming Data
Many elementary properties of data streams can be obtained through basic
counting methods. Totals, averages, minimums, maximums, and, to some
extent, other order statistics can be computed with O(1) updates and O(1)
storage space. Most systems stop at these elementary values because the
low-latency requirements and potentially unbounded storage make
computing more complicated values prohibitively expensive.
Chapters 9and 10 tackle thisproblem from astatistical perspective. Statistics
is a field that was, essentially, developed to deal with problems that occur
when it is too costly or time consuming to perform a census of the entire
population. Instead, the field of Statistics has developed a toolkit that allows
a sample to be used to make inferences of the population using the toolkit
provided by the mathematics of probability. This chapter provides a brief
introduction to statistical methods and concepts, including a useful
foundation in probability and statistics used to answer questions about the
data rather than simply present tabulated results.
The techniques in this chapter are not specifically related to the analysis
of streaming data. They are just as applicable to finite datasets regardless
of size. Of course, they can also be applied to data streams with some
modifications. Later in this chapter is a discussion of methods of efficiently
sampling from streams of data. Statistical analysis can be applied to these
samples to conduct in-depth analyses, perform forecasts, and so on.
Theframeworkofprobabilityisalsousefulforthedevelopmentofspecialized
data structures for maintaining summaries of stream data. These specialized
structures are discussed in-depth in Chapter 10 and rely onan understanding
of the behavior of random values, which is discussed in the remainder of this
chapter.
Numerical Libraries
These next few chapters make heavy use of numerical functions not found
in the standard mathematics library of most languages. The examples in this
chapter are largely written in Java and use the open source Colt numerical
Search WWH ::




Custom Search