Database Reference
In-Depth Information
FIGURE 2.2 : Distribution of BEP for 300 random sets.
We can also see that CNN uses terrorist related words such as ' al-qaeda 'or
' suicide ' more often than Al Jazeera. Al Jazeera apparently focuses more on
' withdraw .' There is also an interesting observation that the word ' Hussein 'is
more characteristic for CNN while the word ' Saddam ' is more characteristic
for Al Jazeera. Both words refer to the same person.
2.5 Topic-Wise Comparison of Term Bias
Using a method borrowed from statistical cross-language analysis, we can
compare the data generated by the two news outlets as if it was written
in different languages. Kernel Canonical Correlation Analysis (kCCA) [see
Appendix C ] (14) is a method for correlating two multidimensional random
variables, that is how our documents are modelled in the vector space ap-
proach. It has been used to analyze bilingual corpora, extracting both topics
from the corpora and semantically related pairs of words in the two languages
(15) (7). We are interested in discovering if there are specific term-choice
biases in certain topics, but we want to discover these topics automatically.
In our experiments we used the set of news pairs obtained with n =2as
a paired dataset for kCCA. Both news outlets use the same language so we
could use the same bag-of-words space for each view. The output of kCCA is a
 
Search WWH ::




Custom Search