Databases Reference
In-Depth Information
Figure 9: Freshness of Articles
delivery. One of the notable features of push-based, multiple-channel-based
information dissemination systems is to send information to users in a form of
time-series articles.
To find interesting information for users from the large quantity of data,
information filtering techniques and search engines, which are mainly based on
the keywords, have been very useful. However, since the keywords of incoming
news articles are sometimes unknown, these typical methods may fail in acquiring
the
fresh
(or
popular
) articles. The
freshness, popularity
and
urgency
are defined
here as
time-series features
of news articles
9,
10)
. These features can be used to filter
the time-series articles to acquire the fresh, popular and urgency news.
4.1
Freshness
The articles, which are quite different from previously selected articles, would be
valuable. In other words, we can say that the articles have their freshness and
uniqueness. Indeed in some cases, the articles may be
scoop
news.
As shown in Figure 9, the freshness of the article
a
can be estimated by
•
the number of its similar articles in a restrospective scope, denoted by
fresh
num
(a),
•
the dissimilarity between
a
and the past articles in a retrospective scope,
denoted by
fresh
cd
(a),
•
the densimeter of its similar articles in a retrospective scope, denoted by
fresh
de
(a),
and
•
the time distance of a and its similar articles in a retrospective scope, denoted
by
fresh
td
(a,
ω
).
The
integrated freshness
of an article
a
compared with articles in a retrospective
scope
, denoted by
fresh
(a),
is also defined as follows:
(4.1)
(4.2)
(4.3)
(4.4)