Databases Reference
In-Depth Information
where a, ß,
γ
, s are the weight values,
γ
is the set of articles similar to a that are in
the scope
.
Let the m be the number of articles in
and n be the number of articles in O.
The above four types of freshness measurements are defined as follows.
γ
Freshness based on the number of similar articles When there are few articles that
are similar to a in
, we can say a is newer one and its freshness is considered to be
high. So, we have
(4.5)
Freshness based on the content distance The content distance of article a and b
can be defined as follows:
(4.6)
The content distance can represent how much new information has been added to
a compared with b . Therefore, we can say, the content distance between a and its
similar articles is becomes bigger, the freshness of a is considered to become higher.
Thus, we have
(4.7)
Freshness based on the densimeter of similar articles The densimeter d of the similar
articles of a in
is computed as m/n . Here, d can be considered as the appearance
probability of a in
. When d is small, a is rare one, and its freshness is considered
to be high. So, we have
(4.8)
Freshness based on the time distance Assume that some articles in the past received
archive are similar to article a and that the time distance between a and those
similar articles is large. In this case, some fresh information is considered to occur,
and so, the article a is considered to have a high freshness. So, we have
(4.9)
where t(a) is the broadcast time of article a .
4.2 Popularity
In order to select valuable one from the large quantity of news articles, the similarity
and dissimilarity of the article compared with previously selected articles should
Search WWH ::




Custom Search