Database Reference
In-Depth Information
(len(trainfeats),
len(testfeats))
classifier = NaiveBayesClassifier.train(trainfeats)
print 'Accuracy:', nltk.classify.util.accuracy(classifier,
testfeats)
classifier.show_most_informative_features()
# prepare confusion matrix
pos = [classifier.classify(fs) for (fs,l) in
posfeats[cutoff:]]
pos = np.array(pos)
neg = [classifier.classify(fs) for (fs,l) in
negfeats[cutoff:]]
neg = np.array(neg)
print 'Confusion matrix:'
print '\t'*2, 'Predicted class'
print '-'*40
print '|\t %d (TP) \t|\t %d (FN) \t| Actual class' % (
(pos == 'pos').sum(), (pos == 'neg').sum()
print '-'*40
print '|\t %d (FP) \t|\t %d (TN) \t|' % (
(neg == 'pos').sum(), (neg == 'neg').sum())
print '-'*40
The output that follows shows that the naïve Bayes classifier is trained on 1,600
instances and tested on 400 instances from the movie corpus. The classifier
achieves an accuracy of 73.5%. Most information features for positive reviews
from the corpus include words such as outstanding , vulnerable , and
astounding ; and words such as insulting , ludicrous , and uninvolving
are the most informative features for negative reviews. At the end, the output also
shows the confusion matrix corresponding to the classifier to further evaluate the
performance.
Train on 1600 instances
Test on 400 instances
Accuracy: 0.735
Most Informative Features
outstanding = True pos : neg = 13.9 : 1.0
insulting = True neg : pos = 13.7 : 1.0
vulnerable = True pos : neg = 13.0 : 1.0
Search WWH ::




Custom Search