Spark Streaming - Learning Spark

Database Reference

In-Depth Information

Figure 10-6. A windowed stream with a window duration of 3 batches and a slide

duration of 2 batches; every two time steps, we compute a result over the previous 3

time steps

Example 10-17. How to use window() to count data over a window in Scala

val accessLogsWindow = accessLogsDStream . window ( Seconds ( 30 ), Seconds ( 10 ))

val windowCounts = accessLogsWindow . count ()

Example 10-18. How to use window() to count data over a window in Java

JavaDStream < ApacheAccessLog > accessLogsWindow = accessLogsDStream . window (

Durations . seconds ( 30 ), Durations . seconds ( 10 ));

JavaDStream < Integer > windowCounts = accessLogsWindow . count ();

While we can build all other windowed operations on top of window() , Spark Stream‐

ing provides a number of other windowed operations for efficiency and convenience.

First, reduceByWindow() and reduceByKeyAndWindow() allow us to perform reduc‐

tions on each window more efficiently. They take a single reduce function to run on

the whole window, such as + . In addition, they have a special form that allows Spark

to compute the reduction incrementally , by considering only which data is coming

into the window and which data is going out. This special form requires an inverse of

the reduce function, such as - for + . It is much more efficient for large windows if

your function has an inverse (see Figure 10-7 ).

Search WWH ::

Custom Search

Home