Using BigQuery from Third-Party Tools - Google BigQuery Analytics

Database Reference

In-Depth Information

:435.000 Max. :614.0000

…

This command creates a data frame with more than 16,000 rows, one row

for each word that is used in more than one play but not all of them. The

first column is the word; the subsequent columns are the TF-IDF values of

the word in each of the plays. You can drop the first column at this point

because you won't need to know which word it is to perform the prediction.

However, you can store the word as the name of the row in the data frame in

case you need it again.

> rownames(results) <- results$word

> results$word <- NULL

Next, you need to know whether a particular play is a comedy, a history,

or a tragedy. You can look this information up from the list at

plays.php and compute a lookup table by hand:

> categories_str = "

corpus, type

onekinghenryiv, history

onekinghenryvi, history

twokinghenryiv, history

twokinghenryvi, history

threekinghenryvi, history

allswellthatendswell, comedy

antonyandcleopatra, tragedy

asyoulikeit, comedy

comedyoferrors, comedy

coriolanus, tragedy

cymbeline, tragedy

hamlet, tragedy

juliuscaesar, tragedy

kinghenryv, history

kinghenryviii, history

kingjohn, history

kinglear, history

kingrichardii, history

Search WWH ::

Custom Search

Home