Information Technology Reference
In-Depth Information
Fig. 7.1
Illustration of KNN Offline-1
retrieved. Then, a model h q i is trained using the labeled data in N k (q i ) offline and in
advance. In the test process, for a new query q , one first finds its k nearest neighbors
N k (q) , and then compares N k (q) with every N k (q i )(i
1 ,...,n) so as to find the
one sharing the largest number of instances with N k (q) . Suppose it is N k (q i ) . Then
h q i , the model learned from the selected neighborhood N k (q i ) (it has been created
offline and in advance), is used to rank the documents of query q .
KNN Offline-2 works as follows (see Fig. 7.2 ). First, for each training query
q i , its k nearest neighbors N k (q i ) in the query feature space are retrieved. Then,
a model h q i is trained using the labeled data in N k (q i ) offline and in advance. In-
stead of seeking the k nearest neighbors for the test query q , one only finds its
single nearest neighbor in the query feature space. Supposing this nearest neigh-
bor to be q i , then the model h q i trained from N k (q i ) is used to test query q .In
this way, one further simplifies the search of k nearest neighbors to that of a single
nearest neighbor, and no longer needs to compute the similarity between neighbor-
It has been shown that the test complexities of KNN Offline-1 and KNN Offline-
2 are both O(n log n) , which is comparable with the single model approach. At the
same time, experimental results in [ 5 ] show that the effectiveness of these two offline
algorithms are comparable with KNN Online. Actually, it has been proven that the
approximations of KNN Online by KNN Offline-1 and KNN Offline-2 are accurate
in terms of difference in loss of prediction, as long as the learning algorithm used
is stable with respect to minor changes in training examples. And learning-to-rank
algorithms with regularization items in their loss functions, such as Ranking SVM,
are typical stable algorithms.
Search WWH ::

Custom Search