Database Reference
In-Depth Information
Stateless Transformations
Stateless transformations, some of which are listed in
Table 10-1
, are simple RDD
transformations being applied on every batch—that is, every RDD in a DStream. We
have already seen
filter()
in
Figure 10-3
. Many of the RDD transformations dis‐
cussed in Chapters
3
and
4
are also available on DStreams. Note that key/value
DStream transformations like
reduceByKey()
are made available in Scala by
import
StreamingContext._
. In Java, as with RDDs, it is necessary to create a
JavaPairD
Stream
using
mapToPair()
.
Table 10-1. Examples of stateless DStream transformations (incomplete list)
Function name
Purpose
Scala example
Signature of user-
supplied function on
DStream[T]
Apply a function to each
element in the DStream and
return a DStream of the result.
map()
ds.map(x => x + 1)
f: (T) → U
Apply a function to each
element in the DStream and
return a DStream of the
contents of the iterators
returned.
flatMap()
ds.flatMap(x => x.split(" "))
f: T → Itera
ble[U]
Return a DStream consisting of
only elements that pass the
condition passed to filter.
filter()
ds.filter(x => x != 1)
f: T → Boolean
Change the number of
partitions of the DStream.
N/A
reparti
tion()
ds.repartition(10)
Combine values with the same
key in each batch.
reduceBy
Key()
ds.reduceByKey(
(x, y) => x + y)
f: T, T → T
Group values with the same
key in each batch.
N/A
groupBy
Key()
ds.groupByKey()
Keep in mind that although these functions look like they're applying to the whole
stream, internally each DStream is composed of multiple RDDs (batches), and each
stateless transformation applies
separately
to each RDD. For example,
reduceByKey()
will reduce data within each time step, but not across time steps. The stateful trans‐
formations we cover later allow combining data across time.