Database Reference
In-Depth Information
Note that an absolute sentiment level is not necessarily very informative. Instead,
a baseline should be established and then compared against the latest observed
values. For example, a ratio of 40% positive tweets on a topic versus 60% negative
might not be considered a sign that a product is unsuccessful if other similar
successful products have a similar ratio based on the psychology of when people
tweet.
The previous example demonstrates how to use naïve Bayes to perform sentiment
analysis. The example can be applied to tweets on ACME's bPhone and bEbook
simply by replacing the movie review corpus with the pretagged tweets. Other
classifiers can also be used in place of naïve Bayes.
The movie review corpus contains only 2,000 reviews; therefore, it is relatively
easy to manually tag each review. For sentiment analysis based on larger amounts
of streaming data such as millions or billions of tweets, it is less feasible to collect
and construct datasets of tweets that are big enough or manually tag each of the
tweets to train and test one or more classifiers. There are two popular ways to
cope with this problem. The first way to construct pretagged data, as illustrated in
recent work by Go et al. [41] and Pak and Paroubek [42], is to apply supervision
and use emoticons such as :) and :( to indicate if a tweet contains positive or
negative sentiments. Words from these tweets can in turn be used as clues to
classify the sentiments of future tweets. Go et al. [41] use classification methods
including naïve Bayes, MaxEnt, and SVM over the training and testing datasets
to perform sentiment classifications. Their demo is available at
http://www.sentiment140.com . Figure 9.6 shows the sentiments resulting
from a query against the term “Boston weather” on a set of tweets. Viewers can
mark the result as accurate or inaccurate, and such feedback can be incorporated
in future training of the algorithm.
Search WWH ::




Custom Search