Database Reference
In-Depth Information
Caching and fault tolerance with Spark Streaming
Like Spark RDDs, DStreams can be cached in memory. The use cases for caching are sim-
ilar to those for RDDs—if we expect to access the data in a DStream multiple times (per-
haps performing multiple types of analysis or aggregation or outputting to multiple external
systems), we will benefit from caching the data. Stateful operators, which include window
functions and updateStateByKey , do this automatically for efficiency.
Recall that RDDs are immutable datasets and are defined by their input data source and lin-
eage —that is, the set of transformations and actions that are applied to the RDD. Fault tol-
erance in RDDs works by recreating the RDD (or partition of an RDD) that is lost due to
the failure of a worker node.
As DStreams are themselves batches of RDDs, they can also be recomputed as required to
deal with worker node failure. However, this depends on the input data still being available.
If the data source itself is fault-tolerant and persistent (such as HDFS or some other fault-
tolerant data store), then the DStream can be recomputed.
If data stream sources are delivered over a network (which is a common case with stream
processing), Spark Streaming's default persistence behavior is to replicate data to two
worker nodes. This allows network DStreams to be recomputed in the case of failure. Note,
however, that any data received by a node but not yet replicated might be lost when a node
fails.
Spark Streaming also supports recovery of the driver node in the event of failure. However,
currently, for network-based sources, data in the memory of worker nodes will be lost in
this case. Hence, Spark Streaming is not fully fault-tolerant in the face of failure of the
driver node or application.
Note
See http://spark.apache.org/docs/latest/streaming-programming-
guide.html#caching—persistence and http://spark.apache.org/docs/latest/streaming-
programming-guide.html#fault-tolerance-properties for more details.
Search WWH ::




Custom Search