Database Reference
In-Depth Information
Stateless Transformations
Stateless transformations, some of which are listed in Table 10-1 , are simple RDD
transformations being applied on every batch—that is, every RDD in a DStream. We
have already seen filter() in Figure 10-3 . Many of the RDD transformations dis‐
cussed in Chapters 3 and 4 are also available on DStreams. Note that key/value
DStream transformations like reduceByKey() are made available in Scala by import
StreamingContext._ . In Java, as with RDDs, it is necessary to create a JavaPairD
Stream using mapToPair() .
Table 10-1. Examples of stateless DStream transformations (incomplete list)
Function name
Purpose
Scala example
Signature of user-
supplied function on
DStream[T]
Apply a function to each
element in the DStream and
return a DStream of the result.
map()
ds.map(x => x + 1)
f: (T) → U
Apply a function to each
element in the DStream and
return a DStream of the
contents of the iterators
returned.
flatMap()
ds.flatMap(x => x.split(" "))
f: T → Itera
ble[U]
Return a DStream consisting of
only elements that pass the
condition passed to filter.
filter()
ds.filter(x => x != 1)
f: T → Boolean
Change the number of
partitions of the DStream.
N/A
reparti
tion()
ds.repartition(10)
Combine values with the same
key in each batch.
reduceBy
Key()
ds.reduceByKey(
(x, y) => x + y)
f: T, T → T
Group values with the same
key in each batch.
N/A
groupBy
Key()
ds.groupByKey()
Keep in mind that although these functions look like they're applying to the whole
stream, internally each DStream is composed of multiple RDDs (batches), and each
stateless transformation applies separately to each RDD. For example, reduceByKey()
will reduce data within each time step, but not across time steps. The stateful trans‐
formations we cover later allow combining data across time.
 
Search WWH ::




Custom Search