Database Reference
In-Depth Information
Figure 10-6. A windowed stream with a window duration of 3 batches and a slide
duration of 2 batches; every two time steps, we compute a result over the previous 3
time steps
Example 10-17. How to use window() to count data over a window in Scala
val accessLogsWindow = accessLogsDStream . window ( Seconds ( 30 ), Seconds ( 10 ))
val windowCounts = accessLogsWindow . count ()
Example 10-18. How to use window() to count data over a window in Java
JavaDStream < ApacheAccessLog > accessLogsWindow = accessLogsDStream . window (
Durations . seconds ( 30 ), Durations . seconds ( 10 ));
JavaDStream < Integer > windowCounts = accessLogsWindow . count ();
While we can build all other windowed operations on top of window() , Spark Stream‐
ing provides a number of other windowed operations for efficiency and convenience.
First, reduceByWindow() and reduceByKeyAndWindow() allow us to perform reduc‐
tions on each window more efficiently. They take a single reduce function to run on
the whole window, such as + . In addition, they have a special form that allows Spark
to compute the reduction incrementally , by considering only which data is coming
into the window and which data is going out. This special form requires an inverse of
the reduce function, such as - for + . It is much more efficient for large windows if
your function has an inverse (see Figure 10-7 ).
 
Search WWH ::




Custom Search