Dimensionality Reduction with Spark - Machine Learning with Spark

Database Reference

In-Depth Information

The dimensionality reduction techniques in MLlib are only available in Scala or Java at

the time of writing this topic, so we will continue to use the Scala Spark shell to run the

models. Therefore, you won't need to run a PySpark console.

Tip

We have provided the full Python code with this chapter as a Python script as well as in

the IPython Notebook format. For instructions on installing IPython, see the code bundle.

Let's display the image given by the first path we extracted earlier using matplotlib's im-

read and imshow functions:

path = "/PATH/lfw/PATH/lfw/Aaron_Eckhart/

Aaron_Eckhart_0001.jpg"

ae = imread(path)

imshow(ae)

Note

You should see the image displayed in your Notebook (or in a pop-up window if you are

using the standard IPython shell). Note that we have not shown the image here.

Extracting facial images as vectors

While a full treatment of image processing is beyond the scope of this topic, we will need

to know a few basics to proceed. Each color image can be represented as a three-dimen-

sional array, or matrix, of pixels. The first two dimensions, that is the x and y axes, repres-

ent the position of each pixel, while the third dimension represents the red, blue, and

green ( RGB ) color values for each pixel.

A grayscale image only requires one value per pixel (there are no RGB values), so it can

be represented as a plain two-dimensional matrix. For many image-processing and ma-

chine learning tasks related to images, it is common to operate on grayscale images. We

will do this here by converting the color images to grayscale first.

It is also a common practice in machine learning tasks to represent an image as a vector,

instead of a matrix. We do this by concatenating each row (or alternatively, each column)

of the matrix together to form a long vector (this is known as reshaping ). In this way,

each raw, grayscale image matrix is transformed into a feature vector that is usable as in-

put to a machine learning model.

Search WWH ::

Custom Search

Home