Database Reference
In-Depth Information
As mentioned earlier, the link function used in logistic regression is the logit link:
1 / (1 + exp(-w
T
x))
The related loss function for logistic regression is the logistic loss:
log(1 + exp(-yw
T
x))
Here,
y
is the actual target variable (either
1
for the positive class or
-1
for the negative
class).
Linear support vector machines
SVM is a powerful and popular technique for regression and classification. Unlike logistic
regression, it is not a probabilistic model but predicts classes based on whether the model
evaluation is positive or negative.
The SVM link function is the identity link, so the predicted outcome is:
y = w
T
x
Hence, if the evaluation of
w
T
x
is greater than or equal to a threshold of 0, the SVM will
assign the data point to class 1; otherwise, the SVM will assign it to class 0 (this threshold
is a model parameter of SVM and can be adjusted).
The loss function for SVM is known as the
hinge loss
and is defined as:
max(0, 1 - yw
T
x)
SVM is a maximum margin classifier—it tries to find a weight vector such that the classes
are separated as much as possible. It has been shown to perform well on many classifica-
tion tasks, and the linear variant can scale to very large datasets.
Note
SVMs have a large amount of theory behind them, which is beyond the scope of this
topic, but you can visit
http://en.wikipedia.org/wiki/Support_vector_machine
and
http://www.support-vector-machines.org/
for more details.