Database Reference
In-Depth Information
The problem with using MLE is that if a word never occurs in document d ,
it will get a zero probability ( P ( w k |
d )=0). Thusawordin d t but not in d j
will make KL ( θ d t d j )=
.
Smoothing techniques are necessary to adjust the maximum likelihood es-
timation so that the KL-based measure is more appropriate. Research shows
that retrieval and filtering performance is highly sensitive to smoothing pa-
rameters when using language models. Several smoothing methods have been
applied to ad hoc information retrieval, text classification problems, and nov-
elty detection (69)(73).
8.5.4 Summary of Novelty Detection
The work described above is focused on the redundancy measure, and it is
somewhat user independent in the sense that our redundancy measures only
calculate a score indicating the degree of redundancy in a document given a
history of delivered documents. They do not actually make a decision as to
whether a document is considered redundant or novel.
A redundancy threshold is needed in order to classify a document as
redundant or novel. When human assessors are asked to make redundancy
decisions given the same topics and document streams, they sometimes dis-
agreed. In some cases the disagreement was based on differences in the as-
sessors' internal definition of redundancy. However, more often one assessor
might feel that a document d t should be considered redundant if a previously
seen document d j covered 80% of d t ; the other assessor might not consider
it redundant unless the coverage was more than 95%. A person's tolerance
for redundancy can be modeled with a user-dependent threshold that con-
verts a redundancy score into a redundancy decision. User feedback about
which documents are redundant can serve as training data. Over time the
system can learn to estimate the probability that a new document with a
given redundancy score would be considered redundant. This probability can
be expressed as P (user j thinks d t is redundant
|
R ( d t |
D t )).
8.6 Other Adaptive Filtering Topics
While learning user profiles is an advantage of a filtering system, it is also a
major research challenge in the adaptive filtering research community. Com-
mon learning algorithms require a significant amount of training data. How-
ever, a real-world filtering system must work as soon as the user uses the sys-
 
Search WWH ::




Custom Search