Database Reference
In-Depth Information
From the plots of the log and square root transformations, we can see that both result in a
more even distribution relative to the raw values. While they are still not normally distrib-
uted, they are a lot closer to a normal distribution when compared to the original target
variable.
Distribution of square-root-transformed target variable values
Impact of training on log-transformed targets
So, does applying these transformations have any impact on model performance? Let's
evaluate the various metrics we used previously on log-transformed data as an example.
We will do this first for the linear model by applying the numpy log function to the
label field of each LabeledPoint RDD. Here, we will only transform the target
variable, and we will not apply any transformations to the features:
Search WWH ::




Custom Search