Adaptive Information Filtering - Text Mining: Classification, Clustering, and Applications

Database Reference

In-Depth Information

learn the structure of the graph or to estimate the conditional probabilities de-

fined on the graphs, and thus the model structure and parameter estimations

are rather ad hoc (24). Another example is the language modeling approach ,

which is a statistical approach that models the document generation process.

This approach is a very active research area in the IR community since the

late 90's (20).

8.3.2 Existing Adaptive Filtering Approaches

The key component of an adaptive filtering system is the user profile used by

the system to make the decision of whether to deliver a document to the user

or not. In the early research work as well as some recent commercial filtering

systems, a user profile is represented as Boolean logic (25). With the growing

computation power and the advance of research in the information retrieval

community in the last 20 years, filtering systems have gone beyond simple

Boolean queries and represent a user profile as either a vector, a statistical

distribution of words, or something else. Much of the research on adaptive

filtering is focused on learning a user profile from explicit user feedback on

whether the user likes a document or not while interacting with the user. In

general, there are two major approaches.

8.3.2.1

Filtering as retrieval + thresholding

A typical retrieval system has a static information source, and the task is

to return a ranking of documents in response to a short-term user request.

Because of the influence of the retrieval models, some existing filtering systems

use “retrieval scoring+thresholding” approach for filtering and build adaptive

filtering based on algorithms originally designed for the retrieval task. A

filtering system uses a retrieval algorithm to score each incoming document

and delivers the document to the user if and only if the score is above a

dissemination threshold. Some examples of retrieval models that have been

applied to the adaptive filtering task are: Rocchio, language models, Okapi,

and pseudo relevance feedback (3) (12) (35) (5) (19) (54).

A threshold is not needed in a retrieval task, because the system only needs

to return a ranked list of documents. A major research topic in the adaptive

filtering community is on how to set dissemination thresholds (48) (7) (63)

(6) (72) (68). The criteria of thresholds are often expressed in an easy to

understand way, such as the utility function described in Section 8.2. At

each time point, the system learns a threshold from the relevance judgements

collected so far. For example, one direct utility optimization technique is

to compute the utility on the training data for each candidate threshold, and

choose the threshold that gives the maximum utility. Score distribution based

approach assumes generative models of scores for relevant documents and

non-relevant documents. For example, one can assume the scores of relevant

documents follow a Gaussian distribution, and the scores for non-relevant

Search WWH ::

Custom Search

Home