Database Reference
In-Depth Information
10
Working with
Unstructured and
Textual Data
In this chapter, we will cover the following recipes:
F
Tokenizing text
F
Finding sentences
F
Focusing on content words with stoplists
F
Getting document frequencies
F
Scaling document frequencies by document size
F
Scaling document frequencies with TF-IDF
F
Finding people, places, and things with Named Entity Recognition
F
Mapping documents to a sparse vector space representation
F
Performing topic modeling with MALLET
F
Performing naïve Bayesian classiication with MALLET