Database Reference
In-Depth Information
Figure 10-6. A windowed stream with a window duration of 3 batches and a slide
duration of 2 batches; every two time steps, we compute a result over the previous 3
time steps
Example 10-17. How to use window() to count data over a window in Scala
val
accessLogsWindow
=
accessLogsDStream
.
window
(
Seconds
(
30
),
Seconds
(
10
))
val
windowCounts
=
accessLogsWindow
.
count
()
Example 10-18. How to use window() to count data over a window in Java
JavaDStream
<
ApacheAccessLog
>
accessLogsWindow
=
accessLogsDStream
.
window
(
Durations
.
seconds
(
30
),
Durations
.
seconds
(
10
));
JavaDStream
<
Integer
>
windowCounts
=
accessLogsWindow
.
count
();
While we can build all other windowed operations on top of
window()
, Spark Stream‐
ing provides a number of other windowed operations for efficiency and convenience.
First,
reduceByWindow()
and
reduceByKeyAndWindow()
allow us to perform reduc‐
tions on each window more efficiently. They take a single reduce function to run on
the whole window, such as
+
. In addition, they have a special form that allows Spark
to compute the reduction
incrementally
, by considering only which data is coming
into the window and which data is going out. This special form requires an inverse of
the reduce function, such as
-
for
+
. It is much more efficient for large windows if
your function has an inverse (see
Figure 10-7
).