Java Reference
In-Depth Information
to work with a professional Realtor and find your perfect
home.
Best Category: misc.forsale
For the
martinLuther
text, we get the following output:
Text: Luther taught that salvation and subsequently
eternity in heaven is not earned by good deeds but is
received only as a free gift of God's grace through faith
in Jesus Christ as redeemer from sin and subsequently
eternity in Hell.
Best Category: soc.religion.christian
They both correctly classified the text.
Sentiment analysis using LingPipe
Sentiment analysis is performed in a very similar manner to that of general text classifica-
tion. One difference is the use of only two categories: positive and negative.
We need to use data files to train our model. We will use a simplified version of the senti-
ment analysis performed at
http://alias-i.com/lingpipe/demos/tutorial/sentiment/read-
me.html
using sentiment data found developed for movies (
http://www.cs.cornell.edu/
people/pabo/movie-review-data/review_polarity.tar.gz
). This data was developed from
1,000 positive and 1,000 negative reviews of movies found in IMDb's movie archives.
These reviews need to be downloaded and extracted. A
txt_sentoken
directory will
be extracted along with its two subdirectories:
neg
and
pos
. Both of these subdirectories
contain movie reviews. Although some of these files can be held in reserve to evaluate the
model created, we will use all of them to simplify the explanation.
We will start with re-initialization of variables declared in the
Using LingPipe to classify
text
section. The
categories
array is set to a two-element array to hold the two cat-
egories. The
classifier
variable is assigned a new
DynamicLMClassifier
in-
stance using the new category array and
nGramSize
of size 8:
categories = new String[2];
categories[0] = "neg";
categories[1] = "pos";
nGramSize = 8;