Obtaining, Processing, and Preparing Data with Spark - Machine Learning with Spark

Database Reference

In-Depth Information

Tip

Note that the preceding code example is, strictly speaking, not very scalable, as it requires

collecting all the data to the driver. We can use Spark's mean function for numeric RDDs

to compute the mean, but there is no median function available currently. We can solve

this by creating our own or by computing the median on a sample of the dataset created

using the sample function (we will see more of this in the upcoming chapters).

Search WWH ::

Custom Search

Home