Mining Text Conversations - Methods for Mining and Summarizing Text Conversations

Databases Reference

In-Depth Information

statistical classifier. Their best results, on both the sentiment and arguing classification tasks, are

found by using the basic BOW approach combined with the lexicons and the dialogue information.

Also on the AMI corpus, Raaijmakers et al. [ 2008 ] approached the problem of detecting

subjectivity in meeting speech by using a variety of multi-modal features such as prosodic features,

word n-grams, character n-grams and phoneme n-grams. For subjectivity detection, they found that

a combination of all features was best, while prosodic features were less useful for discriminating

between positive and negative utterances. They found character n-grams to be particularly useful.

Murray and Carenini [ 2010 ] address the same tasks of subjectivity detection and polarity clas-

sification as Raaijmakers et al., but on both the AMI corpus and BC3 email corpus. Because they are

interested in both spoken and written conversations, their system does not exploit prosodic features

as the system of Raiijmakers et al. does, but they nonetheless achieve comparable performance on

the AMI corpus. In addition to fixed-sequence n-grams, the authors also introduce varying instan-

tiation n-grams , where each unit of the n-gram can either be a word of a word's part-of-speech tag,

and make use of lexico-syntactic patterns output by the Riloff and Phillips [ 2004 ] algorithm. One

finding is that detecting negative polarity sentences is much more difficult than the other sentiment

detection tasks, owing partly to the fact that these sentences are relatively rare and can be manifested

very subtly. This is particularly true of face-to-face meetings such as the AMI corpus, where negative

sentences are not common and seem rarely to be signaled by overt lexical cues.

Carenini et al. [ 2008 ] are not only interested in detecting subjectivity in emails, but in ex-

ploiting that subjectivity information to aid an email summarization system. They take a lexicon-

subjectivity-

based sum-

marization

based approach to detecting subjective words and phrases, using existing sentiment dictionar-

ies [ Kim and Hovy , 2005 , Wilson et al. , 2005 ] and combining measures of subjectivity with mea-

sures of lexical cohesion to obtain their best results. Another email summarization system is that

of Wan and McKeown [ 2004 ], who do not specifically model sentiment but do attempt to sum-

marize discussions that are decision-based, featuring agreements and disagreements, and they have

annotated their email corpus for such phenomena, to be exploited in future work. Email summa-

rization systems are described in much more detail in Chapter 4 .

There has been a great deal of sentiment and opinion mining research focused on blogs, albeit

at a very large-scale, coarse-granularity level. Some of this research attempts to capture a snapshot

of the overall blogosphere mood; for example, Mishne and de Rijke [ 2006 ] analyze over 8 million

LiveJournal 7 posts in order to capture a “blogosphere state-of-mind”.The authors learn textual senti-

ment features by taking advantage of the fact that, in their corpus, many bloggers indicate their mood

at the time of each blog post, and the data can therefore be treated as labeled. Mishne and Glance

[ 2006 ] try to predict movie sales by analyzing the sentiment of blog posts mentioning the movie, and

found that considering sentiment improved results over a baseline that only analyzed the volume of

postings.

Much of the work on blog opinion mining has emerged under the umbrella of the Text

Retrieval Conference (TREC, now the Text Analysis Conference (TAC)). Beginning in 2006,TREC

7 http:/ /www.livejournal.com/

Search WWH ::

Custom Search

Home