Information Technology Reference
relevance since it measures the probability of a click based on the URL. The second
one is the probability that the user is satisfied given that he has clicked on the link;
so it can been understood as a 'ratio' between actual and perceived relevance, and
the true relevance of the document can be computed as a u s u .
With the DBN model defined as above, the Expectation-Maximization (EM) al-
gorithm can be used to find the maximum likelihood estimate of the variables a u
and s u . The parameter γ is treated as a configurable parameter for the model and is
not considered in the parameter estimation process.
13.2.2 Click Data Enhancement
In the previous subsection, we have introduced various click models for ground truth
mining. These models can be effective, however, they also have certain limitations.
First, although the click information is very helpful, it is not the only information
source that can be used to mine ground-truth labels. For example, the content infor-
mation about the query and the clicked documents can also be very helpful. More
reliable labels are expected to be mined if one can use more comprehensive informa-
tion for the task. Second, it is almost unavoidable that the mined labels from click-
through logs are highly sparse. There may be three reasons: (i) the click-through
logs from a search engine company may not cover all the users' behaviors due to
its limited market share; (ii) since the search results provided by existing search en-
gines are far from perfect, it is highly possible that no document is relevant with
respect to some queries and therefore there will be no clicks for such queries; (iii)
users may issue new queries constantly, and therefore historical click-through logs
cannot cover newly issued queries.
To tackle the aforementioned problem, in [ 1 ], Agichtein et al. consider more
information to learn user interaction model using training data, and in [ 15 ], some
smoothing techniques are used to expand the sparse click data. We will introduce
these two pieces of work in detail in this subsection.
188.8.131.52 Learning a User Interaction Model
In [ 1 ], a rich set of features is used to characterize whether a user will be satisfied
with a web search result. Once the user has submitted a query, he/she will perform
many different actions (e.g., reading snippets, clicking results, navigating, and re-
fining the query). To capture and summarize these actions, three groups of features
are used: query-text, click-through, and browsing.
Query-text features : Users decide which results to examine in more detail by look-
ing at the result title, URL, and snippet. In many cases, looking at the original
document is not even necessary. To model this aspect of user experience, features
that characterize the nature of the query and its relation to the snippet text are
extracted, including overlap between the words in the title and in the query, the
fraction of words shared by the query and the snippet, etc.