Database Reference
In-Depth Information
Instead, we use Tweets that contain emoticons as labeled data. We will mark Tweets
that contain “:)”, “:D”, or similar as positive, and Tweets that contain “:(”, “;-(”, or
similar as negative. In this way we know that the lexicon is built using relevant data.
Figure 4.6 shows the top 25 most likely words for each sentiment in the Occupy
Wall Street dataset. This gives us an idea of what words define each sentiment.
When reading these word clouds, one might get confused about the prominence of
“ows” in both groups. While it is the most prominent word for both sentiments, its
appearance in the Tweet does nothing to help us understand its sentiment. It is more
important to look for words that appear in one sentiment class but not the other ,or
those with a large size difference.
4.2.2.3
The Sentiment Analysis Process
We have outlined the process to create a sentiment analysis framework. Listing 4.5
contains a snippet that performs the sentiment analysis task. The code begins by
enumerating each Tweet in the dataset, building a lexicon from the Tweets that use
an emoticon. Next, it enumerates the Tweets again, calculating a sentiment score for
each Tweet that does not have an emoticon. Listing 4.5 shows an example of this
process. For the code that actually builds the lexicon and calculates the sentiment
score, see NaiveBayesSentimentClassifier.java .
Listing 4.5
Sentiment analysis runner
public class TestNBC {
public static void main(String[] args){
String filename = args.length >= 1 ? args[0] :
"testows.json" ;
//initialize the sentiment classifier
NaiveBayesSentimentClassifier nbsc = new
NaiveBayesSentimentClassifier();
try {
//read the file, and train each document
JsonStreamParser parser =
new JsonStreamParser(new FileReader(filename));
JsonObject elem;
String text;
while (parser.hasNext()) {
elem = parser.next().getAsJsonObject();
text = elem.get( "text" ).getAsString();
nbsc.trainInstance(text);
}
//now go through and classify each line as positive or
negative
parser =
 
Search WWH ::




Custom Search