Database Reference
In-Depth Information
Lexicon
Positive
p( )
p( )
max
Negative
ows
p( )
Fig. 4.6 The sentiment analysis workflow. John's Tweet is compared against a lexicon of words
and their likelihood to be positive/negative. The most probable label is then taken as that Tweet's
sentiment
or “negative”. After looking at these words, the algorithm then judges whether the
text in the Tweet is positive or negative based on the likelihood for each possibility.
A workflow is shown in Fig. 4.6 .
To compare the content in the Tweets, we must first find a lexicon , a dictionary of
words and their positive and negative scores. 4 When choosing a sentiment lexicon,
we need to be careful about the source used to build it. Words have different
sentiments in different contexts. For example, in a lexicon built by looking at
movie reviews, “bomb” would likely have a positive sentiment (“that movie was
the bomb”). In a lexicon built by looking at world news articles, “bomb” would
likely be negative (“the bomb detonated in...”).
The sentiment analysis algorithm we use in this topic is based on a Naïve Bayes
Classifier. It classifies a Tweet as positive or negative by comparing each word in
the Tweet with the labeled words in the lexicon. If the words in the Tweet have been
used more in positive Tweets, then the Tweet is labeled as positive. On the other
hand, if the words in the Tweet have been associated more with negative Tweets,
then the Tweet is labeled as negative.
4.2.2.2
Building a Lexicon Automatically
To get around the potential issue of having an unsuitable lexicon, we will construct
our lexicon automatically for each dataset. Because we are using data collected
directly from Twitter, we do not have explicit “positive”, or “negative” labels.
4 Some sentiment lexicons are available for free, such as SentiWordNet ( http://sentiwordnet.isti.
cnr.it/ ) .
 
Search WWH ::




Custom Search