Dimensionality Reduction with Spark - Machine Learning with Spark

Database Reference

In-Depth Information

These steps are not strictly necessary, but both steps are done in many cases for efficiency

purposes. Using RGB color images instead of grayscale increases the amount of data to be

processed by a factor of 3. Similarly, larger images increase the processing and storage

overhead significantly. Our raw 250 x 250 images represent 187,500 data points per image

using three color components. For a set of 1055 images, this is 197,812,500 data points.

Even if stored as integer values, each value stored takes 4 bytes of memory, so just 1055

images represent around 800 MB of memory! As you can see, image-processing tasks can

quickly become extremely memory intensive.

If we convert to grayscale and resize the images to, say, 50 x 50 pixels, we only require

2500 data points per image. For our 1055 images, this equates to 10 MB of memory,

which is far more manageable for illustrative purposes.

Tip

Another reason to resize is that MLlib's PCA model works best on tall and skinny

matrices with less than 10,000 columns. We will have 2500 columns (that is, each pixel

becomes an entry in our feature vector), so we will come in well below this restriction.

Let's define our processing function. We will do the grayscale conversion and resizing in

one step, using the java.awt.image package:

def processImage(image: BufferedImage, width: Int, height:

Int): BufferedImage = {

val bwImage = new BufferedImage(width, height,

BufferedImage.TYPE_BYTE_GRAY)

val g = bwImage.getGraphics()

g.drawImage(image, 0, 0, width, height, null)

g.dispose()

bwImage

}

The first line of the function creates a new image of the desired width and height and spe-

cifies a grayscale color model. The third line draws the original image onto this newly

created image. The drawImage method takes care of the color conversion and resizing

for us! Finally, we return the new, processed image.

Let's test this out on our sample image. We will convert it to grayscale and resize it to 100

x 100 pixels:

Search WWH ::

Custom Search

Home