Database Reference
In-Depth Information
news article pairs, describing the same event, we will mostly focus on n =2
in the following sections. However, the most important results will be given
also for the other values of n .
Another parameter influencing the accuracy of discovered mates is the time
window within which the mate search is done. Increase of the time window
size also increases the number of candidates for the nearest-neighbor list. This
in turn means that in order for two articles being selected as mates they must
pass through more strict filters.
We ran the news matching algorithm for different sizes of the time window
andthetopnearest-neighborlist. The results can be seen in Figure 2.1.
From the results we can see that increasing the time window really reduces
the number of discovered pairs. Another thing that can be noted from the
graph is that the reduction is much more evident when a nearest-neighbor
list is large while the reduction hardly affects the smaller nearest-neighbor
lists. In the paper we will mostly focus on the case when n =2andthetime
window is 15 days. From the graph we can note that further increase of time
window for the case of n = 2 hardly influences the number of mates which in
turn indicates that the selected mates are relatively accurate.
Note finally that this filtering stage is also likely to remove any potential
error introduced by the story extraction phase, since it is unlikely that the two
outlets would have highly similar text in the navigation menus or banners, that
is also time correlated. We have at this point a list of 816 item-pairs collected
over 1 year from CNN and Al Jazeera from which we are rather confident
FIGURE 2.1 : The window size is on the x axis and the number of discovered
mates is on the y axis. The graph number of discovered mates for nearest-
neighbor lists of sizes n =1 , 2 , 3 , 5 , 7 , 10.
Search WWH ::




Custom Search