Database Reference
In-Depth Information
<1x2645 sparse matrix of type '<type 'numpy.float64'>'
with 1 stored elements in Compressed Sparse Column
format>,
<1x2645 sparse matrix of type '<type 'numpy.float64'>'
with 2 stored elements in Compressed Sparse Column
format>,
<1x2645 sparse matrix of type '<type 'numpy.float64'>'
with 2 stored elements in Compressed Sparse Column
format>,
<1x2645 sparse matrix of type '<type 'numpy.float64'>'
with 1 stored elements in Compressed Sparse Column
format>]
We can see that each movie title has now been transformed into a sparse vector. We can
see that the titles where we extracted two terms have two non-zero entries in the vector,
titles where we extracted only one term have one non-zero entry, and so on.
Tip
Note the use of Spark's broadcast method in the preceding example code to create a
broadcast variable that contains the term dictionary. In real-world applications, such term
dictionaries can be extremely large, so using a broadcast variable is advisable.
Search WWH ::




Custom Search