Information Technology Reference
In-Depth Information
Browsing features : These features are used to characterize users' interactions with
pages beyond the search result page. For example, one can compute how long
users dwell on a page or domain. Such features allow us to model intra-query
diversity of the page browsing behavior (e.g., navigational queries, on average,
are likely to have shorter page dwell time than transactional or informational
queries).
Click-through features : Clicks are a special case of user interaction with the
search engine. Click-through features used in [ 1 ] include the number of clicks
for the result, whether there is a click on the result below or above the current
URL, etc.
Some of the above features (e.g., click-through features and dwell time) are re-
garded as biased and only probabilistically related to the true relevance. Such fea-
tures can be represented as a mixture of two components, one is the prior “back-
ground” distribution for the value of the feature aggregated across all queries, and
the other is the component of the feature influenced by the relevance of the docu-
ments. Therefore, one can subtract the background distribution from the observed
feature value for the document at a given position. This treatment can well deal with
the position bias in the click-through data.
Given the above features (with the subtraction of the background distribution),
a general implicit feedback interpretation strategy is learned automatically instead
of relying on heuristics or insights. The general approach is to train a classifier to
induce weights for the user behavior features, and consequently derive a predictive
model of user preferences. The training is done by comparing a wide range of im-
plicit behavior features with explicit human judgments for a set of queries. RankNet
[ 4 ] is used as the learning machine.
According to the experiments conducted in [ 1 ], by using the machine learning
based approach to combine multiple pieces of evidence, one can mine more reliable
ground-truth labels for documents than purely relying on the click-through informa-
tion.
13.2.2.2 Smoothing Click-Through Data
In order to tackle the sparseness problem with the click-through data, in [ 15 ], a query
clustering technique is used to smooth the data.
Suppose we have obtained click-through information for query q and docu-
ment d . The basic idea is to propagate the click-through information to other similar
queries. In order to determine the similar queries, the co-click principle (queries for
which users have clicked on the same documents can be considered to be similar) is
employed. Specifically, a random walk model is used to derive the query similarity
in a dynamic manner.
For this purpose, a click graph that is a bipartite-graph representation of click-
through data is constructed.
n
i
m
j
{
q i }
represents a set of query nodes and
{
d j }
1
represents a set of document nodes. Then the bipartite graph can be represented
by a m
=
1
=
×
n matrix W , in which W i,j represents the click information associated
Search WWH ::




Custom Search