Advanced Text Processing with Spark - Machine Learning with Spark

Database Reference

In-Depth Information

Summary

In this chapter, we took a deeper look into more complex text processing and explored

MLlib's text feature extraction capabilities, in particular the TF-IDF term weighting

schemes. We covered examples of using the resulting TF-IDF feature vectors to compute

document similarity and train a newsgroup topic classification model. Finally, you learned

how to use MLlib's cutting-edge Word2Vec model to compute a vector representation of

words in a corpus of text and use the trained model to find words with contextual meaning

that is similar to a given word.

In the next chapter, we will take a look at online learning, and you will learn how Spark

Streaming relates to online learning models.

Search WWH ::

Custom Search

Home