Database Reference
In-Depth Information
Using packages for feature extraction
While we have covered many different approaches to feature extraction, it will be rather
painful to have to create the code to perform these common tasks each and every time. Cer-
tainly, we can create our own reusable code libraries for this purpose; however, fortunately,
we can rely on the existing tools and packages.
Since Spark supports Scala, Java, and Python bindings, we can use packages available in
these languages that provide sophisticated tools to process and extract features and repres-
ent them as vectors. A few examples of packages for feature extraction include scikit-learn,
gensim, scikit-image, matplotlib, and NLTK in Python; OpenNLP in Java; and Breeze and
Chalk in Scala. In fact, Breeze has been part of Spark MLlib since version 1.0, and we will
see how to use some Breeze functionality for linear algebra in the later chapters.
Search WWH ::




Custom Search