Database Reference
In-Depth Information
error between what our model predicts and the actual outcomes observed. This process is
called model fitting , training , or optimization .
More formally, we seek to find the weight vector that minimizes the sum, over all the
training examples, of the loss (or error) computed from some loss function. The loss func-
tion takes the weight vector, feature vector, and the actual outcome for a given training ex-
ample as input and outputs the loss. In fact, the loss function itself is effectively specified
by the link function; hence, for a given type of classification or regression (that is, a given
link function), there is a corresponding loss function.
Tip
For further details on linear models and loss functions, see the linear methods section re-
lated to binary classification in the Spark Programming Guide at http://spark.apache.org/
docs/latest/mllib-linear-methods.html#binary-classification . Also, see the Wikipedia entry
for generalized linear models at http://en.wikipedia.org/wiki/Generalized_linear_model .
While a detailed treatment of linear models and loss functions is beyond the scope of this
topic, MLlib provides two loss functions suitable to binary classification (you can learn
more about them from the Spark documentation). The first one is logistic loss, which
equates to a model known as logistic regression , while the second one is the hinge loss,
which is equivalent to a linear Support Vector Machine ( SVM ). Note that the SVM does
not strictly fall into the statistical framework of generalized linear models but can be used
in the same way as it essentially specifies a loss and link function.
In the following image, we show the logistic loss and hinge loss relative to the actual
zero-one loss. The zero-one loss is the true loss for binary classification—it is either zero
if the model predicts correctly or one if the model predicts incorrectly. The reason it is not
actually used is that it is not a differentiable loss function, so it is not possible to easily
compute a gradient and, thus, very difficult to optimize.
The other loss functions are approximations to the zero-one loss that make optimization
possible.
Search WWH ::




Custom Search