Information Technology Reference
In-Depth Information
In this chapter, we will try to answer the above questions. In particular, we will
first introduce various models for user click behaviors and discuss how they can be
used to automatically mine ground-truth labels for learning to rank. Then we will
discuss the problem of data selection for learning to rank, which includes document
selection for labeling, and document/feature selection for training.
13.2 Ground Truth Mining from Logs
13.2.1 User Click Models
Most commercial search engines log users' click behaviors during their interaction
with the search interface. Such click logs embed important clues about user satis-
faction with a search engine and can provide a highly valuable source of relevance
information. As compared to human judgment, click information is much cheaper
to obtain and can reflect the up-to-date relevance (relevance will change along with
time). However, clicks are also known to be biased and noisy. Therefore, it is neces-
sary to develop some models to remove the bias and noises in order to obtain reliable
relevance labels.
Classical click models include the position models [ 10 , 13 , 28 ] and the cascade
model [ 10 ]. A position model assumes that a click depends on both relevance and
examination. Each document has a certain probability of being examined, which
decays by and only depends on rank positions. A click on a document indicates
that the document is examined and considered relevant by the user. However this
model treats the individual documents in a search result page independently and fails
to capture the interdependency between documents in the examination probability.
The cascade model assumes that users examine the results sequentially and stop
as soon as a relevant document is clicked. Here, the probability of examination is
indirectly determined by two factors: the rank of the document and the relevance of
all previous documents. The cascade model makes a strong assumption that there
is only one click per search and hence it could not explain the abandoned search or
search with multiple clicks.
To sum up, there are at least the following problems with the aforementioned
classical models.
The models cannot effectively deal with multiple clicks in a session.
The models cannot distinguish perceived relevance and actual relevance. Because
users cannot examine the content of a document until they click on the document,
the decision to click is made based on perceived relevance. While there is a strong
correlation between perceived relevance and actual relevance, there are also many
cases where they differ.
The models cannot naturally lead to a preference probability on a pair of docu-
ments, while such preference information is required by many pairwise ranking
methods.
Search WWH ::




Custom Search