Using BigQuery from Third-Party Tools - Google BigQuery Analytics

Database Reference

In-Depth Information

Now, you have all the data you need to run the naïve Bayesian classifier.

There are a couple of different implementations of the naïve Bayes in R; the

example uses the one from the e1071 package. You need to make sure it is

installed first before you can use it:

> install.packages("e1071")

. . .

> library(e1071)

You can train your naïve Bayes classifier by giving it a table of plays with

their TF-IDF values and the actual values we're looking for (that is, whether

the play is a comedy, history, or tragedy). Note that you need to transpose

the results because the naiveBayes function expects that each row will be a

different sample, with the columns being the features, whereas your results

table has the columns as samples (that is, plays) and the rows as features

(that is, TF-IDF values for a particular word). In R, you can transpose with

the t() function:

> classifier <- naiveBayes(t(results), categories[,1])

Finally, with the classifier trained, you can predict whether a play is a

comedy, history, or tragedy with the predict() method:

> predictions <- predict(classifier, t(results))

> predictions

[1] history history history history history comedy

tragedy comedy

[9] comedy tragedy tragedy tragedy tragedy history

history history

[17] history history history comedy comedy tragedy

comedy comedy

[25] comedy comedy comedy tragedy history tragedy

comedy comedy

[33] tragedy tragedy tragedy comedy comedy comedy

To test the validity of your predictions, you can check against the actual

values:

Search WWH ::

Custom Search

Home