Database Reference
In-Depth Information
10
Working with
Unstructured and
Textual Data
In this chapter, we will cover the following recipes:
F Tokenizing text
F Finding sentences
F Focusing on content words with stoplists
F Getting document frequencies
F Scaling document frequencies by document size
F Scaling document frequencies with TF-IDF
F Finding people, places, and things with Named Entity Recognition
F Mapping documents to a sparse vector space representation
F Performing topic modeling with MALLET
F Performing naïve Bayesian classiication with MALLET
Search WWH ::




Custom Search