Database Reference
In-Depth Information
are already paired, and the two elements of each pair are definitely written
by authors adopting different perspectives. This makes the signal stronger
and the analysis very interesting. In both cases, it is found that statistical
learning algorithms (both generative models, and discriminative models) can
be used to identify the perspective from which an article has been written.
By contrast, in our study we have a somewhat opposite situation. Not only
the articles are not naturally paired, but they are also not on a specific topic
(as those in bitterlemons), or written by a specific author (as in the case of
presidential debates). Furthermore there is no obvious reason to assume that
any two news outlets should show a measurable bias in their choice of terms,
when reporting on the same events. This makes the signal much harder to
isolate, and indeed the automatic identification of topics by using kCCA is
very helpful in showing the most biased topics. From a methodological point
of view, our use of concepts from machine translation and cross-language
retrieval can provide a complementary position to the methods purely based
on text categorization that have been so far proposed.
Somewhat related to the above is also the paper (12) where the task in-
volved is to analyze the transcripts of U.S. Congressional floor debates to
determine whether the speeches represent support of or opposition to pro-
posed legislation. Rather than paired documents, here we just have labelled
documents, but the label somehow relates to the attitude of the speaker. This
is again cast as a text categorization task, where the authors use SVM classi-
fiers, again showing that statistical discrimination algorithms can capture the
subtle signals contained in the choice of words and relating opinion.
A more indirect relation to this theme can also be found within the grow-
ing literature on sentiment analysis, or opinion analysis, where the author's
attitude towards a topic or a product is extracted. In these studies, it is typi-
cally the presence of specific key-words that is used to determine the attitude
of the writer towards a certain issue. Projecting documents onto a subspace
spanned by polarized words may be a way to simplify and direct the search
for lexical bias in news outlets.
Author identification literature is also indirectly related, as in our experi-
ments we establish the presence of a lexical or stylistic bias by identifying the
outlet (author) based on a text.
2.8 Conclusion
We have presented a fully automatic method for the analysis of term-choice
bias in media outlets, using state of the art technology in information ex-
traction and pattern analysis. Our automated analysis has uncovered the
existence of a statistically significant lexical difference between CNN and Al
 
Search WWH ::




Custom Search