Database Reference
In-Depth Information
General transformations
The Spark Streaming API also exposes a general transform function that gives us ac-
cess to the underlying RDD for each batch in the stream. That is, where the higher level
functions such as map transform a DStream to another DStream, transform allows us
to apply functions from an RDD to another RDD. For example, we can use the RDD
join operator to join each batch of the stream to an existing RDD that we computed sep-
arately from our streaming application (perhaps, in Spark or some other system).
Note
The full list of transformations and further information on each of them is provided in the
Spark documentation at http://spark.apache.org/docs/latest/streaming-programming-
guide.html#transformations-on-dstreams .
Actions
While some of the operators we have seen in Spark Streaming, such as count , are not
actions as in the batch RDD case, Spark Streaming has the concept of actions on
DStreams. Actions are output operators that, when invoked, trigger computation on the
DStream. They are as follows:
print : This prints the first 10 elements of each batch to the console and is typic-
ally used for debugging and testing.
saveAsObjectFile , saveAsTextFiles , and saveAsHadoopFiles :
These functions output each batch to a Hadoop-compatible filesystem with a file-
name (if applicable) derived from the batch start timestamp.
forEachRDD : This operator is the most generic and allows us to apply any ar-
bitrary processing to the RDDs within each batch of a DStream. It is used to ap-
ply side effects , such as saving data to an external system, printing it for testing,
exporting it to a dashboard, and so on.
Tip
Note that like batch processing with Spark, DStream operators are lazy . In the same way
in which we need to call an action, such as count , on an RDD to ensure that processing
takes place, we need to call one of the preceding action operators in order to trigger com-
putation on a DStream. Otherwise, our streaming application will not actually perform
any computation.
Search WWH ::




Custom Search