Database Reference
In-Depth Information
:435.000 Max. :614.0000
This command creates a data frame with more than 16,000 rows, one row
for each word that is used in more than one play but not all of them. The
first column is the word; the subsequent columns are the TF-IDF values of
the word in each of the plays. You can drop the first column at this point
because you won't need to know which word it is to perform the prediction.
However, you can store the word as the name of the row in the data frame in
case you need it again.
> rownames(results) <- results$word
> results$word <- NULL
Next, you need to know whether a particular play is a comedy, a history,
or a tragedy. You can look this information up from the list at
http://www.opensourceshakespeare.org/views/plays/
plays.php and compute a lookup table by hand:
> categories_str = "
corpus, type
onekinghenryiv, history
onekinghenryvi, history
twokinghenryiv, history
twokinghenryvi, history
threekinghenryvi, history
allswellthatendswell, comedy
antonyandcleopatra, tragedy
asyoulikeit, comedy
comedyoferrors, comedy
coriolanus, tragedy
cymbeline, tragedy
hamlet, tragedy
juliuscaesar, tragedy
kinghenryv, history
kinghenryviii, history
kingjohn, history
kinglear, history
kingrichardii, history
Search WWH ::




Custom Search