Database Reference
In-Depth Information
of supervised learning algorithms (e.g., Rocchio-style classifiers, Exponential-
Gaussian models, local regression and logistic regression approaches) have
been studied in adaptive settings with explicit and implicit relevance feedback,
and on benchmark datasets from TREC (Text Retrieval Conferences) and the
TDT (Topic Detection and Tracking) evaluation forum (1; 5; 8; 18; 25; 31;
29). Regularized logistic regression (26), for example, is one of the strong-
performing methods in terms of both effectiveness and eciency, and is easy
to scale for frequent adaptations over large datasets such as the TREC-10
corpus with over 800,000 documents and 84 topics.
9.1.2 Related Work in Topic Detection and Tracking (TDT)
Topic Detection and Tracking (TDT) research focuses on automated
detection and tracking of news events from multiple sources of temporally
ordered stories (2). TDT has two primary tasks: topic tracking and novelty
detection. The topic tracking task, although defined independently, is almost
identical to the adaptive filtering task except that user feedback is assumed
to be not available, although pseudo-relevance feedback (PRF) by the system
is allowed. PRF means that the system takes the top-ranking documents in
a retrieved list for a topic as truly relevant in its profile adaptation for that
topic. PRF may be useful when training examples are sparse and when true
relevance feedback is not sucient (26).
Novelty detection (ND), the other primary task in TDT, aims to detect the
first report of each new event from temporally ordered news stories. The task
is also called First-Story Detection (FSD) or New Event Detection (NED).
There has been a significant body of work for addressing ND problems.
Yang et al. (23) examined incremental clustering for grouping documents
into events, and used the cosine similarity in combination with some time-
decaying function to measure the novelty of new documents with respect to
historical events. Zhang et al. (30) developed a Bayesian statistic framework
for modeling the growing number of events over time in a non-parametric
Dirichlet process. Yang et al. (24) studied effective use of Named Entities
in the modeling of novelty of documents conditioned on events and higher-
level topics. Zhang et al. (32) compared alternative measures for sentence-
level novelty detection conditioned on perfect knowledge of document-level
relevance; cosine similarity worked the best in their experiments. Allan et al.
(3) argued for the importance of comparing novelty measures under a more
realistic assumption, i.e., under the condition that sentence-level relevance
is not available but predicted by a system. Kuo et al. (12) developed a
indexing-tree strategy for speedy computation and investigated the use of
Named Entities.
Search WWH ::




Custom Search