Building a Classification Model with Spark - Machine Learning with Spark

Database Reference

In-Depth Information

error between what our model predicts and the actual outcomes observed. This process is

called model fitting , training , or optimization .

More formally, we seek to find the weight vector that minimizes the sum, over all the

training examples, of the loss (or error) computed from some loss function. The loss func-

tion takes the weight vector, feature vector, and the actual outcome for a given training ex-

ample as input and outputs the loss. In fact, the loss function itself is effectively specified

by the link function; hence, for a given type of classification or regression (that is, a given

link function), there is a corresponding loss function.

Tip

For further details on linear models and loss functions, see the linear methods section re-

lated to binary classification in the Spark Programming Guide at http://spark.apache.org/

docs/latest/mllib-linear-methods.html#binary-classification . Also, see the Wikipedia entry

for generalized linear models at http://en.wikipedia.org/wiki/Generalized_linear_model .

While a detailed treatment of linear models and loss functions is beyond the scope of this

topic, MLlib provides two loss functions suitable to binary classification (you can learn

more about them from the Spark documentation). The first one is logistic loss, which

equates to a model known as logistic regression , while the second one is the hinge loss,

which is equivalent to a linear Support Vector Machine ( SVM ). Note that the SVM does

not strictly fall into the statistical framework of generalized linear models but can be used

in the same way as it essentially specifies a loss and link function.

In the following image, we show the logistic loss and hinge loss relative to the actual

zero-one loss. The zero-one loss is the true loss for binary classification—it is either zero

if the model predicts correctly or one if the model predicts incorrectly. The reason it is not

actually used is that it is not a differentiable loss function, so it is not possible to easily

compute a gradient and, thus, very difficult to optimize.

The other loss functions are approximations to the zero-one loss that make optimization

possible.

Search WWH ::

Custom Search

Home