Databases Reference
In-Depth Information
statistical classifier. Their best results, on both the sentiment and arguing classification tasks, are
found by using the basic BOW approach combined with the lexicons and the dialogue information.
Also on the AMI corpus, Raaijmakers et al. [ 2008 ] approached the problem of detecting
subjectivity in meeting speech by using a variety of multi-modal features such as prosodic features,
word n-grams, character n-grams and phoneme n-grams. For subjectivity detection, they found that
a combination of all features was best, while prosodic features were less useful for discriminating
between positive and negative utterances. They found character n-grams to be particularly useful.
Murray and Carenini [ 2010 ] address the same tasks of subjectivity detection and polarity clas-
sification as Raaijmakers et al., but on both the AMI corpus and BC3 email corpus. Because they are
interested in both spoken and written conversations, their system does not exploit prosodic features
as the system of Raiijmakers et al. does, but they nonetheless achieve comparable performance on
the AMI corpus. In addition to fixed-sequence n-grams, the authors also introduce varying instan-
tiation n-grams , where each unit of the n-gram can either be a word of a word's part-of-speech tag,
and make use of lexico-syntactic patterns output by the Riloff and Phillips [ 2004 ] algorithm. One
finding is that detecting negative polarity sentences is much more difficult than the other sentiment
detection tasks, owing partly to the fact that these sentences are relatively rare and can be manifested
very subtly. This is particularly true of face-to-face meetings such as the AMI corpus, where negative
sentences are not common and seem rarely to be signaled by overt lexical cues.
Carenini et al. [ 2008 ] are not only interested in detecting subjectivity in emails, but in ex-
ploiting that subjectivity information to aid an email summarization system. They take a lexicon-
subjectivity-
based sum-
marization
based approach to detecting subjective words and phrases, using existing sentiment dictionar-
ies [ Kim and Hovy , 2005 , Wilson et al. , 2005 ] and combining measures of subjectivity with mea-
sures of lexical cohesion to obtain their best results. Another email summarization system is that
of Wan and McKeown [ 2004 ], who do not specifically model sentiment but do attempt to sum-
marize discussions that are decision-based, featuring agreements and disagreements, and they have
annotated their email corpus for such phenomena, to be exploited in future work. Email summa-
rization systems are described in much more detail in Chapter 4 .
There has been a great deal of sentiment and opinion mining research focused on blogs, albeit
at a very large-scale, coarse-granularity level. Some of this research attempts to capture a snapshot
of the overall blogosphere mood; for example, Mishne and de Rijke [ 2006 ] analyze over 8 million
LiveJournal 7 posts in order to capture a “blogosphere state-of-mind”.The authors learn textual senti-
ment features by taking advantage of the fact that, in their corpus, many bloggers indicate their mood
at the time of each blog post, and the data can therefore be treated as labeled. Mishne and Glance
[ 2006 ] try to predict movie sales by analyzing the sentiment of blog posts mentioning the movie, and
found that considering sentiment improved results over a baseline that only analyzed the volume of
postings.
Much of the work on blog opinion mining has emerged under the umbrella of the Text
Retrieval Conference (TREC, now the Text Analysis Conference (TAC)). Beginning in 2006,TREC
7 http:/ /www.livejournal.com/
Search WWH ::




Custom Search