Database Reference
In-Depth Information
Extracting features from the LFW dataset
In order to avoid having to download and process a very large dataset, we will work with a
subset of the images, using people who have names that start with an "A". This dataset can
be downloaded from http://vis-www.cs.umass.edu/lfw/lfw-a.tgz .
Note
For more details and other variants of the data, visit http://vis-www.cs.umass.edu/lfw/ .
The original research paper reference is:
Gary B. Huang , Manu Ramesh , Tamara Berg , and Erik Learned-Miller . Labeled Faces in
the Wild: A Database for Studying Face Recognition in Unconstrained Environments .
University of Massachusetts, Amherst, Technical Report 07-49, October, 2007.
It can be downloaded from http://vis-www.cs.umass.edu/lfw/lfw.pdf .
Unzip the data using the following command:
>tar xfvz lfw-a.tgz
This will create a folder called lfw , which contains a number of subfolders, one for each
person.
Exploring the face data
Start up your Spark Scala console by ensuring that you allocate sufficient memory, as di-
mensionality reduction methods can be quite computationally expensive:
>./SPARK_HOME/bin/spark-shell --driver-memory 2g
Now that we've unzipped the data, we face a small challenge. Spark provides us with a way
to read text files and custom Hadoop input data sources. However, there is no built-in func-
tionality to allow us to read images.
Spark provides a method called wholeTextFiles , which allows us to operate on entire
files at once, compared to the textFile method that we have been using so far, which
operates on the individual lines within a text file (or multiple files).
Search WWH ::




Custom Search