Information Technology Reference
the form of b k − 1 ≤
b k , while the implicit constraint uses redundant training exam-
ples to guarantee the ordinal relationship between thresholds.
2.4.3 Ordinal Regression with Threshold-Based Loss Functions
In [ 18 ], different loss functions for ordinal regression are compared. Basically two
types of threshold-based loss functions are investigated, i.e., immediate-threshold
loss and all-threshold loss. Here, the thresholds refer to b k (k
1 ) which
separate different ordered categories.
Suppose the scoring function is f(x) , and φ is a margin penalty function. φ can
be the hinge, exponential, logistic, or square function. Then the immediate-threshold
loss is defined as follows:
L(f ; x j ,y j ) = φ f(x j ) − b y j − 1 + φ b y j − f(x j ) ,
where for each labeled example (x j ,y j ) , only the two thresholds defining the “cor-
rect” segment (b y j − 1 ,b y j ) are considered. In other words, the immediate-threshold
loss is ignorant of whether multiple thresholds are crossed.
The all-threshold loss is defined as below, which is a sum of all threshold-
φ s(k,y j ) b k − f(x j ) ,
L(f ; x j ,y j ) =
1 ,k< j ,
s(k,y j )
y j .
Note that the slope of the above loss function increases each time a threshold is
crossed. As a result, solutions are encouraged that minimize the number of thresh-
olds that are crossed.
The aforementioned two loss functions are tested on the MovieLens dataset. 3 The
experimental results show that the all-threshold loss function can lead to a better
ranking performance than multi-class classification and simple regression methods,
as well as the method minimizing the immediate-threshold loss function.
In this section, we first discuss the relationship between the pointwise approach and
some early learning methods in information retrieval, such as relevance feedback.
Then, we discuss the limitations of the pointwise approach.
3 http://www.grouplens.org/node/73 .