Advanced Analytical Theory and Methods: Text Analysis - Data Science and Big Data Analytics

Database Reference

In-Depth Information

Note that an absolute sentiment level is not necessarily very informative. Instead,

a baseline should be established and then compared against the latest observed

values. For example, a ratio of 40% positive tweets on a topic versus 60% negative

might not be considered a sign that a product is unsuccessful if other similar

successful products have a similar ratio based on the psychology of when people

tweet.

The previous example demonstrates how to use naïve Bayes to perform sentiment

analysis. The example can be applied to tweets on ACME's bPhone and bEbook

simply by replacing the movie review corpus with the pretagged tweets. Other

classifiers can also be used in place of naïve Bayes.

The movie review corpus contains only 2,000 reviews; therefore, it is relatively

easy to manually tag each review. For sentiment analysis based on larger amounts

of streaming data such as millions or billions of tweets, it is less feasible to collect

and construct datasets of tweets that are big enough or manually tag each of the

tweets to train and test one or more classifiers. There are two popular ways to

cope with this problem. The first way to construct pretagged data, as illustrated in

recent work by Go et al. [41] and Pak and Paroubek [42], is to apply supervision

and use emoticons such as :) and :( to indicate if a tweet contains positive or

negative sentiments. Words from these tweets can in turn be used as clues to

classify the sentiments of future tweets. Go et al. [41] use classification methods

including naïve Bayes, MaxEnt, and SVM over the training and testing datasets

to perform sentiment classifications. Their demo is available at

http://www.sentiment140.com . Figure 9.6 shows the sentiments resulting

from a query against the term “Boston weather” on a set of tweets. Viewers can

mark the result as accurate or inaccurate, and such feedback can be incorporated

in future training of the algorithm.

Search WWH ::

Custom Search

Home