Database Reference
In-Depth Information
...
14/10/25 14:22:02 INFO SparkContext: Job finished: collect
at Word2Vec.scala:368, took 56.585983 s
14/10/25 14:22:02 INFO MappedRDD: Removing RDD 200 from
persistence list
14/10/25 14:22:02 INFO BlockManager: Removing RDD 200
14/10/25 14:22:02 INFO BlockManager: Removing block
rdd_200_0
14/10/25 14:22:02 INFO MemoryStore: Block rdd_200_0 of size
9008840 dropped from memory (free 1755596828)
word2vecModel: org.apache.spark.mllib.feature.Word2VecModel
= org.apache.spark.mllib.feature.Word2VecModel@2b94e480
Once trained, we can easily find the top 20 synonyms for a given term (that is, the most
similar term to the input term, computed by cosine similarity between the word vectors).
For example, to find the 20 most similar terms to hockey , use the following lines of code:
word2vecModel.findSynonyms("hockey", 20).foreach(println)
As we can see from the following output, most of the terms relate to hockey or other
sports topics:
(sport,0.6828256249427795)
(ecac,0.6718048453330994)
(hispanic,0.6519884467124939)
(glens,0.6447514891624451)
(woofers,0.6351765394210815)
(boxscores,0.6009076237678528)
(tournament,0.6006366014480591)
(champs,0.5957855582237244)
(aargh,0.584071934223175)
(playoff,0.5834275484085083)
(ahl,0.5784651637077332)
(ncaa,0.5680188536643982)
(pool,0.5612311959266663)
(olympic,0.5552600026130676)
(champion,0.5549421310424805)
(filinuk,0.5528956651687622)
(yankees,0.5502706170082092)
Search WWH ::




Custom Search