Real-Time Face Recognition from Surveillance Video - Intelligent Video Event Analysis and Understanding

Information Technology Reference

In-Depth Information

5.3

Feature Extraction

5.3.1

Morphological Operations

In order to optimise blob detection, we experimented using a number of convolution

kernels:

•

Custom kernels for erosion and dilation

•

Built-in convolution kernels for O PEN CV

•

Morphological transformations for opening and closing the image

The morphological operations of closing and opening an image predefined in

O PEN CV represent a combination of erosion and dilation. The effects are as

follows:

Opening an image results in removing small bright regions of the image. The

larger bright regions are isolated but retain their size. We tried opening the image

because a relatively small bright region (appearing as an artefact of wearing a nose

ring) was interfering with our algorithm of nostril detection.

Closing an image results in joining bright regions within the image. The dark

regions remain dark and their size is unchanged. We tried this, aiming to provide a

better contrast between the nostril/pupils and the brighter areas that surround them.

Opening then closing an image should theoretically discard the smaller brighter

regions and then join all the larger brighter regions while keeping the dark regions

almost unchanged. Although this makes the nostrils more obvious to the human eye,

O PEN CV blob detection (based on contour detection) fails because this operation

has the effect of destroying edges.

The test video images were convolved with 12 different kernels, and the perfor-

mance of each was compared. Figure 25 illustrates how the choice of convolution

kernel affects the accuracy of feature detection. Each kernel was also compared for

its effect on computation time, as shown in figure 26 12 .

Our experiments showed that convolving the images with a rectangular kernel of

size 2

2 provided the most accurate detection of key features (pupils and nostrils).

There was a slightly higher cost in terms of processing time, but this was deemed to

be an acceptable trade-off.

×

5.3.2

Filtering Local Features

Local features were detected using a nested Haar cascade. Often the Haar cascade

would return one or more false positives for each local feature as well as the cor-

rect location of the feature. It was therefore necessary to filter the local features to

determine which candidate was the correct one.

12

The time in seconds refers to the time to process the entire video sequence. The video

sequences were 342 and 230 frames respectively, so our system is somewhat slower than

real time. Real time speeds could easily be achieved in a production system by exploiting

hardware acceleration or parallel processing.

Intelligent Video Event Analysis and Understanding

Search WWH ::

Custom Search

Home