Building a Regression Model with Spark - Machine Learning with Spark

Database Reference

In-Depth Information

From the plots of the log and square root transformations, we can see that both result in a

more even distribution relative to the raw values. While they are still not normally distrib-

uted, they are a lot closer to a normal distribution when compared to the original target

variable.

Distribution of square-root-transformed target variable values

Impact of training on log-transformed targets

So, does applying these transformations have any impact on model performance? Let's

evaluate the various metrics we used previously on log-transformed data as an example.

We will do this first for the linear model by applying the numpy log function to the

label field of each LabeledPoint RDD. Here, we will only transform the target

variable, and we will not apply any transformations to the features:

Search WWH ::

Custom Search

Home