Database Reference
In-Depth Information
TABLE 2.4: Results for news outlet identification of a
news item from the set of news item pairs for different sizes
of time window. Nearest-neighbor list size is fixed to 2.
window size
5 0 5 0 0 5 0
BEP
85%
83%
81%
81%
80%
79%
79%
Table 2.3 we compared them against results obtained on randomly mixed news
article pairs (where the distinction between outlets was effectively removed).
The randomized pair sets were obtained by taking each pair of news articles
and swapping their outlets with probability 0 . 5. This generated a set where
each story pair was the same as before, but the ordering of the pair was
essentially random.
The permutation test was run on 300 random sets for n =1 ,..., 10 and
it never returned a result better than the one from Table 2.3. For a sample
distribution of BEP obtained on 300 random sets for n =2see Figure2. 2 .
Comparing outlet identification results against random runs gives us a p-value
of 0 . 3% and therefore it is very unlikely that the outlet identification results
would be due to chance since they need to reflect a true distinction in the
distribution of probability over words associated to each news outlet. This,
as we already argued before, indicates that there is a significant bias in the
vocabulary that Al Jazeera or CNN use to describe the same events.
To put some light on the vocabulary bias we extracted the most important
words from the linear SVM classifier for n = 2. These are the words associated
with the largest coecient of the primal weight vector w of the SVM, and
hence the terms that mostly affect the decision made by the classifier. We
obtain the following two lists:
Keywords for CNN: insurgency, militants, troops, hussein, iran,
baghdad, united, terrorists, police, united state, suicide, program, al
qaeda, national, watching, qaeda, baghdad iraq, wounded, palestinians,
al
Keywords for Al Jazeera: iraq, attacks, army, shia, occupation,
withdraw, demanded, americans, claim, mr, nuclear, muslim, saddam,
resistance, agency, fighters, rebels, iraqis, foreign, correspondent
While the experimental findings above are significant and reproducible, we
believe it can also be useful to attempt an interpretation of these figures,
based on an inspection of the specific terms isolated by this analysis. This is
of course based on a subjective analysis of our objective results. Comparing
the lists we can notice that CNN is more inclined to use words like ' insurgency ,'
' militants ,' ' terrorists ' when describing Iraqis, that might be argued to have
negative connotation. On the other hand, Al Jazeera seems more likely to use
words like ' resistance ,' ' fighters ,' and ' rebels ' when describing the same events.
 
Search WWH ::




Custom Search