Information Technology Reference
In-Depth Information
5.3
Feature Extraction
5.3.1
Morphological Operations
In order to optimise blob detection, we experimented using a number of convolution
kernels:
Custom kernels for erosion and dilation
Built-in convolution kernels for O PEN CV
Morphological transformations for opening and closing the image
The morphological operations of closing and opening an image predefined in
O PEN CV represent a combination of erosion and dilation. The effects are as
follows:
Opening an image results in removing small bright regions of the image. The
larger bright regions are isolated but retain their size. We tried opening the image
because a relatively small bright region (appearing as an artefact of wearing a nose
ring) was interfering with our algorithm of nostril detection.
Closing an image results in joining bright regions within the image. The dark
regions remain dark and their size is unchanged. We tried this, aiming to provide a
better contrast between the nostril/pupils and the brighter areas that surround them.
Opening then closing an image should theoretically discard the smaller brighter
regions and then join all the larger brighter regions while keeping the dark regions
almost unchanged. Although this makes the nostrils more obvious to the human eye,
O PEN CV blob detection (based on contour detection) fails because this operation
has the effect of destroying edges.
The test video images were convolved with 12 different kernels, and the perfor-
mance of each was compared. Figure 25 illustrates how the choice of convolution
kernel affects the accuracy of feature detection. Each kernel was also compared for
its effect on computation time, as shown in figure 26 12 .
Our experiments showed that convolving the images with a rectangular kernel of
size 2
2 provided the most accurate detection of key features (pupils and nostrils).
There was a slightly higher cost in terms of processing time, but this was deemed to
be an acceptable trade-off.
×
5.3.2
Filtering Local Features
Local features were detected using a nested Haar cascade. Often the Haar cascade
would return one or more false positives for each local feature as well as the cor-
rect location of the feature. It was therefore necessary to filter the local features to
determine which candidate was the correct one.
12
The time in seconds refers to the time to process the entire video sequence. The video
sequences were 342 and 230 frames respectively, so our system is somewhat slower than
real time. Real time speeds could easily be achieved in a production system by exploiting
hardware acceleration or parallel processing.
Search WWH ::




Custom Search