A Robust Learning-Based Detection and Tracking Algorithm - Technologies and Applications of Artificial Intelligence

Information Technology Reference

In-Depth Information

the output space is the space of possible bounding boxes which can be parameter-

ized by top, bottom, left, and right coordinates in the region. Those coordinates

can take values between 0 and the frame size which can make the problem as

structured regression.

2.2 Structured Learning for Object Localization

In this method, we learn a prediction function which estimates the object po-

sition in each frame instead of learning a classifier. First, we divide the ini-

tial bounding box from the first frame into the smaller box with specified size,

called sub-blocks. The tracker will maintain the position of the object within a

frame f s where s is the number of frames. Given a set of input sub-block images

x s , ..., x s

X and their transformation position y s , ..., y s

ↂ

Y ,welearn

a prediction function p : X

Y . In this method, the output space is the space

of all transformation Y instead of the binary labels

ₒ

1. Next, the prediction

function p is learned by using structured learning framework [13] according to:

±

p ( x )= argmax y∈Y d ( x, y )

(1)

where d ( x, y ) is a discriminant function that should give a maximum value to

pairs ( x, y ) that are well matched. In our approach, a pair ( x, y ) is a labeled

example where y is the preferred transformation of the target object.

In our approach, the discriminant function has the formulation:

d ( x, y )=

w,ˆ ( x, y )

,

(2)

where ˆ ( x, y ) maps the pair ( x, y ) into an appropriate feature space computed

with the dot product. In the machine learning process, the feature mapping

function ˆ is defined by a joint kernel function, formed as:

k ( x, y, x, y )=

ˆ ( x, y ) ,ˆ ( x, y )

,

(3)

consider training patterns x 1 , ..., x n

ↂ

X and their transformation position

y i , ..., y n ↂ

Y . The image kernels function use to compute statistics or features of

the two sub-block images and then compare them. The overlapping sub-block re-

gions will have the same common features and related statistics. By using kernel

map, we make it straightforward to incorporate image features into structured

output approach.

In this approach, we extract Haar-like features presented by Viola[8] to obtain

features in each sub-block. A simple rectangular Haar-like feature can be defined

as the difference of the sum of pixels of areas inside the rectangle, which can be

at any position and scale within the original frame. The values from Haar-like

features indicate the characteristics of a particular area inside the sub-blocks.

Each feature type use to indicate the existence or absence of particular area

characteristics in the sub-blocks, such as edges or changes in texture. We use 6

typesofHaar-likefeaturesasshowninFig.2.

Technologies and Applications of Artificial Intelligence

Search WWH ::

Custom Search

Home