Database Reference
In-Depth Information
Now, you have all the data you need to run the naïve Bayesian classifier.
There are a couple of different implementations of the naïve Bayes in R; the
example uses the one from the e1071 package. You need to make sure it is
installed first before you can use it:
>
install.packages("e1071")
. . .
>
library(e1071)
You can train your naïve Bayes classifier by giving it a table of plays with
their TF-IDF values and the actual values we're looking for (that is, whether
the play is a comedy, history, or tragedy). Note that you need to transpose
the results because the
naiveBayes
function expects that each row will be a
different sample, with the columns being the features, whereas your results
table has the columns as samples (that is, plays) and the rows as features
(that is, TF-IDF values for a particular word). In R, you can transpose with
the
t()
function:
>
classifier <- naiveBayes(t(results), categories[,1])
Finally, with the classifier trained, you can predict whether a play is a
comedy, history, or tragedy with the
predict()
method:
>
predictions <- predict(classifier, t(results))
>
predictions
[1] history history history history history comedy
tragedy comedy
[9] comedy tragedy tragedy tragedy tragedy history
history history
[17] history history history comedy comedy tragedy
comedy comedy
[25] comedy comedy comedy tragedy history tragedy
comedy comedy
[33] tragedy tragedy tragedy comedy comedy comedy
To test the validity of your predictions, you can check against the actual
values: