Database Reference
In-Depth Information
Spark also allows us to control how cached/persisted RDDs are evicted from the
cache. By default Spark uses an LRU cache. Spark will also explicitly evict RDDs older
than a certain time period if you set spark.cleaner.ttl . By preemptively evicting
RDDs that we are unlikely to need from the cache, we may be able to reduce the GC
pressure.
Conclusion
In this chapter, we have seen how to work with streaming data using DStreams. Since
DStreams are composed of RDDs, the techniques and knowledge you have gained
from the earlier chapters remains applicable for streaming and real-time applications.
In the next chapter, we will look at machine learning with Spark.
Search WWH ::




Custom Search