Database Reference
In-Depth Information
Extracting the right features from your
data
The field of natural language processing ( NLP ) covers a wide range of techniques to
work with text, from text processing and feature extraction through to modeling and ma-
chine learning. In this chapter, we will focus on two feature extraction techniques available
within MLlib: the TF-IDF term weighting scheme and feature hashing.
Working through an example of TF-IDF, we will also explore the ways in which process-
ing, tokenization, and filtering during feature extraction can help reduce the dimensionality
of our input data as well as improve the information content and usefulness of the features
we extract.
Search WWH ::




Custom Search