Database Reference
In-Depth Information
The remaining methods, found in the code included with this chapter,
are implemented in a similar manner.
Stochastic Optimization
Although not precisely a form of aggregation, Stochastic optimization bears
mentioning. Whereas aggregation and counting can be used to compute
simple values like averages and standard deviations, which are discussed in
much more detail in the next chapter, there is a family of techniques jointly
known as stochastic optimization methods .
The most famous of these methods is stochastic gradient descent, which
is a widely used technique in machine learning when one is dealing with
very large datasets. The basic idea is that there is some function F(B) that
can be written down as the sum of a function f i (B) applied to each data
point. The name of the game is to find the value of B that minimizes this
function. This can be accomplished in a streaming (or “online”) setting
by computing the gradient of f i (B) and subtracting it from the current
estimate of B after multiplying it by a value r , known as the “learning rate.”
This is essentially a special case of summation and can be used to compute a
variety of interesting values.
A similar technique called stochastic averaging is often used in this way to
compute the median of a data stream. The median is defined as the value
of a dataset such that, when sorted, 50 percent of the data is smaller than
the value, and 50 percent of the data is larger than the value. Ordinarily this
is difficult to calculate on a stream because it requires the collection and
sorting of all the data.
To approximate this value using stochastic optimization, the value of
interest is the current estimate of the median M . If the next observed value
in the stream is larger than M , increase it by r . If it is smaller, decrease the
estimate by r . When M is close to the median, it increases as often as it
decreases, and therefore it stabilizes:
public class MedianEstimator {
double rate = 1.0;
double current = Double. POSITIVE_INFINITY ;
Search WWH ::




Custom Search