Database Reference
In-Depth Information
Summary
In this chapter, we took a deeper look into more complex text processing and explored
MLlib's text feature extraction capabilities, in particular the TF-IDF term weighting
schemes. We covered examples of using the resulting TF-IDF feature vectors to compute
document similarity and train a newsgroup topic classification model. Finally, you learned
how to use MLlib's cutting-edge Word2Vec model to compute a vector representation of
words in a corpus of text and use the trained model to find words with contextual meaning
that is similar to a given word.
In the next chapter, we will take a look at online learning, and you will learn how Spark
Streaming relates to online learning models.
Search WWH ::




Custom Search