Database Reference
In-Depth Information
set of pairs of vectors along which the news article pairs are highly correlated.
Each pair of vectors corresponds to one of the topics from the news collection;
this can be observed by checking the most important keywords in the vectors.
For each pair of vectors we took Al Jazeera vector and subtracted it from
theCNNvector. Wethensortedthewords according to the weight they
had in this vector. If the word had a highly positive weight, then it was
more biased towards CNN and vice versa. Again, this is a way to compare
specific differences between the two distributions of probabilities underlying
the generation of words in CNN and Al Jazeera.
From each pair of vectors we also composed a set of outlet-independent
main keywords describing that topic. This was done by taking the union of
the top 5 keywords from each of the two vectors.
In Table 2.5 we present a list of the top 10 topics discovered by kCCA. For
each topic there is a set of keywords that describe the topic and a set of topic
related keywords specific for CNN and Al Jazeera.
The difference in vocabulary that can be seen from the Table 2.5 is similar
to the one we already discovered in the previous section, using the support
vector machine. This is of course encouraging, as it suggests we detected a
real signal in the data. An important advantage of analysis based on kCCA is
that it adds a crucial extra piece of information: namely how the lexical bias
is dependent on the topics being discussed. kCCA automatically identifies the
main topics, and for each topic the lexical bias between outlets discussing it.
Notice that the 'topics' identified by kCCA (or by any other factor analysis
method) do not need to correspond to topics that are meaningful in the human
sense, although they often are. Attributing a human-topic to a coherent set
of keywords found by kCCA analysis involves some amount of interpretation
of results, and so it can be considered as a subjective step. However it has to
be noticed that - while we do attempt to interpret the topics found by kCCA
- this is not necessary for any step of the analysis.
The topics common to AJ and CNN, as separated by CCA analysis, seem
to be fairly coherent and cover essentially all the key issues in the Middle
East in 2005 (although some topics are a little less focused) - [see Table 2.5]:
1) Iran's nuclear program; 2) Iraq's insurgency; 3) Palestinian question and
Gaza; 4) Iran's nuclear program; 5) Iraq and Palestine; 6) Lebanon and Syria;
7) Afghanistan, Guantanamo, Pakistan; 8) Iraq and Saddam's trial; 9) Human
right abuses; 10) Sharm el Sheik's terror attack.
The table gives an idea of the main differences in lexicon used to report
on the same events, between AJ and CNN. A good example is perhaps Topic
3, where CNN mentions more often words like 'militants,' 'missiles,' 'launch'
while AJ mentions more often words like 'settlers,' 'barriers,' 'farms,' and
'suffer,' suggesting a difference in focus.
Search WWH ::




Custom Search