Database Reference
In-Depth Information
Tip
Note that the preceding code example is, strictly speaking, not very scalable, as it requires
collecting all the data to the driver. We can use Spark's mean function for numeric RDDs
to compute the mean, but there is no median function available currently. We can solve
this by creating our own or by computing the median on a sample of the dataset created
using the sample function (we will see more of this in the upcoming chapters).
Search WWH ::




Custom Search