Obtaining, Processing, and Preparing Data with Spark - Machine Learning with Spark

Database Reference

In-Depth Information

<1x2645 sparse matrix of type '<type 'numpy.float64'>'

with 1 stored elements in Compressed Sparse Column

format>,

<1x2645 sparse matrix of type '<type 'numpy.float64'>'

with 2 stored elements in Compressed Sparse Column

format>,

<1x2645 sparse matrix of type '<type 'numpy.float64'>'

with 2 stored elements in Compressed Sparse Column

format>,

<1x2645 sparse matrix of type '<type 'numpy.float64'>'

with 1 stored elements in Compressed Sparse Column

format>]

We can see that each movie title has now been transformed into a sparse vector. We can

see that the titles where we extracted two terms have two non-zero entries in the vector,

titles where we extracted only one term have one non-zero entry, and so on.

Tip

Note the use of Spark's broadcast method in the preceding example code to create a

broadcast variable that contains the term dictionary. In real-world applications, such term

dictionaries can be extremely large, so using a broadcast variable is advisable.

Search WWH ::

Custom Search

Home