Database Reference
In-Depth Information
learn the structure of the graph or to estimate the conditional probabilities de-
fined on the graphs, and thus the model structure and parameter estimations
are rather ad hoc (24). Another example is the language modeling approach ,
which is a statistical approach that models the document generation process.
This approach is a very active research area in the IR community since the
late 90's (20).
8.3.2 Existing Adaptive Filtering Approaches
The key component of an adaptive filtering system is the user profile used by
the system to make the decision of whether to deliver a document to the user
or not. In the early research work as well as some recent commercial filtering
systems, a user profile is represented as Boolean logic (25). With the growing
computation power and the advance of research in the information retrieval
community in the last 20 years, filtering systems have gone beyond simple
Boolean queries and represent a user profile as either a vector, a statistical
distribution of words, or something else. Much of the research on adaptive
filtering is focused on learning a user profile from explicit user feedback on
whether the user likes a document or not while interacting with the user. In
general, there are two major approaches.
8.3.2.1
Filtering as retrieval + thresholding
A typical retrieval system has a static information source, and the task is
to return a ranking of documents in response to a short-term user request.
Because of the influence of the retrieval models, some existing filtering systems
use “retrieval scoring+thresholding” approach for filtering and build adaptive
filtering based on algorithms originally designed for the retrieval task. A
filtering system uses a retrieval algorithm to score each incoming document
and delivers the document to the user if and only if the score is above a
dissemination threshold. Some examples of retrieval models that have been
applied to the adaptive filtering task are: Rocchio, language models, Okapi,
and pseudo relevance feedback (3) (12) (35) (5) (19) (54).
A threshold is not needed in a retrieval task, because the system only needs
to return a ranked list of documents. A major research topic in the adaptive
filtering community is on how to set dissemination thresholds (48) (7) (63)
(6) (72) (68). The criteria of thresholds are often expressed in an easy to
understand way, such as the utility function described in Section 8.2. At
each time point, the system learns a threshold from the relevance judgements
collected so far. For example, one direct utility optimization technique is
to compute the utility on the training data for each candidate threshold, and
choose the threshold that gives the maximum utility. Score distribution based
approach assumes generative models of scores for relevant documents and
non-relevant documents. For example, one can assume the scores of relevant
documents follow a Gaussian distribution, and the scores for non-relevant
Search WWH ::




Custom Search