Information Technology Reference
In-Depth Information
2.4 Ordinal Regression-Based Algorithms
Ordinal regression takes the ordinal relationship between the ground truth labels
into consideration when learning the ranking model.
Suppose there are K ordered categories. The goal of ordinal regression is to find
a scoring function f , such that one can easily use thresholds b 1
b 2 ≤···≤
b K 1
to distinguish the outputs of the scoring function (i.e., f(x j ), j =
1 ,...,m )into
different ordered categories. 2
2, the ordinal regression
will naturally reduce to a binary classification. In this regard, the ordinal regression-
based algorithms have strong connections with classification-based algorithms.
In the literature, there are several methods in this sub-category, such as [ 1 , 3 , 6 ,
20 ], and [ 2 ]. We will introduce some of them as follows.
Note that when we set K
=
2.4.1 Perceptron-Based Ranking (PRanking)
PRanking is a famous algorithm on ordinal regression [ 6 ]. The goal of PRanking is
to find a direction defined by a parameter vector w , after projecting the documents
onto which one can easily use thresholds to distinguish the documents into different
ordered categories.
This goal is achieved by means of an iterative learning process. On iteration t ,
the learning algorithm gets an instance x j associated with query q .Given x j ,the
algorithm predicts
w T x j
y j =
ˆ
arg min k {
b k < 0
}
. It then receives the ground truth
label y j . If the algorithm makes a mistake (i.e.,
y j ) then there is at least one
threshold, indexed by k , for which the value of w T x j is on the wrong side of b k .To
correct the mistake, we need to move the values of w T x j and b k toward each other.
An example is shown in Fig. 2.3 . After that, the model parameter w is adjusted by
w
y j =
ˆ
x j , just as in many perceptron-based algorithms. This process is repeated
until the training process converges.
Harrington [ 10 ] later proposes using ensemble learning to further improve the
performance of PRanking. In particular, the training data is first sub-sampled,
and a PRanking model is then learned using each sample. After that, the weights
and thresholds associated with the models are averaged to produce the final
model. It has been proven that in this way a better generalization ability can be
achieved [ 12 ].
=
w
+
2 Note that there are some algorithms, such as [ 11 , 13 ], which were also referred to as ordinal
regression-based algorithms in the literature. According to our categorization, however, they be-
long to the pairwise approach since they do not really care about the accurate assignment of a
document to one of the ordered categories. Instead, they focus more on the relative order between
two documents.
Search WWH ::




Custom Search