Database Reference
In-Depth Information
The dimensionality reduction techniques in MLlib are only available in Scala or Java at
the time of writing this topic, so we will continue to use the Scala Spark shell to run the
models. Therefore, you won't need to run a PySpark console.
Tip
We have provided the full Python code with this chapter as a Python script as well as in
the IPython Notebook format. For instructions on installing IPython, see the code bundle.
Let's display the image given by the first path we extracted earlier using matplotlib's im-
read and imshow functions:
path = "/PATH/lfw/PATH/lfw/Aaron_Eckhart/
Aaron_Eckhart_0001.jpg"
ae = imread(path)
imshow(ae)
Note
You should see the image displayed in your Notebook (or in a pop-up window if you are
using the standard IPython shell). Note that we have not shown the image here.
Extracting facial images as vectors
While a full treatment of image processing is beyond the scope of this topic, we will need
to know a few basics to proceed. Each color image can be represented as a three-dimen-
sional array, or matrix, of pixels. The first two dimensions, that is the x and y axes, repres-
ent the position of each pixel, while the third dimension represents the red, blue, and
green ( RGB ) color values for each pixel.
A grayscale image only requires one value per pixel (there are no RGB values), so it can
be represented as a plain two-dimensional matrix. For many image-processing and ma-
chine learning tasks related to images, it is common to operate on grayscale images. We
will do this here by converting the color images to grayscale first.
It is also a common practice in machine learning tasks to represent an image as a vector,
instead of a matrix. We do this by concatenating each row (or alternatively, each column)
of the matrix together to form a long vector (this is known as reshaping ). In this way,
each raw, grayscale image matrix is transformed into a feature vector that is usable as in-
put to a machine learning model.
Search WWH ::




Custom Search