Information Technology Reference
In-Depth Information
the output space is the space of possible bounding boxes which can be parameter-
ized by top, bottom, left, and right coordinates in the region. Those coordinates
can take values between 0 and the frame size which can make the problem as
structured regression.
2.2 Structured Learning for Object Localization
In this method, we learn a prediction function which estimates the object po-
sition in each frame instead of learning a classifier. First, we divide the ini-
tial bounding box from the first frame into the smaller box with specified size,
called sub-blocks. The tracker will maintain the position of the object within a
frame f s where s is the number of frames. Given a set of input sub-block images
x s , ..., x s
X and their transformation position y s , ..., y s
Y ,welearn
a prediction function p : X
Y . In this method, the output space is the space
of all transformation Y instead of the binary labels
1. Next, the prediction
function p is learned by using structured learning framework [13] according to:
±
p ( x )= argmax y∈Y d ( x, y )
(1)
where d ( x, y ) is a discriminant function that should give a maximum value to
pairs ( x, y ) that are well matched. In our approach, a pair ( x, y ) is a labeled
example where y is the preferred transformation of the target object.
In our approach, the discriminant function has the formulation:
d ( x, y )=
w,ˆ ( x, y )
,
(2)
where ˆ ( x, y ) maps the pair ( x, y ) into an appropriate feature space computed
with the dot product. In the machine learning process, the feature mapping
function ˆ is defined by a joint kernel function, formed as:
k ( x, y, x, y )=
ˆ ( x, y ) ( x, y )
,
(3)
consider training patterns x 1 , ..., x n
X and their transformation position
y i , ..., y n
Y . The image kernels function use to compute statistics or features of
the two sub-block images and then compare them. The overlapping sub-block re-
gions will have the same common features and related statistics. By using kernel
map, we make it straightforward to incorporate image features into structured
output approach.
In this approach, we extract Haar-like features presented by Viola[8] to obtain
features in each sub-block. A simple rectangular Haar-like feature can be defined
as the difference of the sum of pixels of areas inside the rectangle, which can be
at any position and scale within the original frame. The values from Haar-like
features indicate the characteristics of a particular area inside the sub-blocks.
Each feature type use to indicate the existence or absence of particular area
characteristics in the sub-blocks, such as edges or changes in texture. We use 6
typesofHaar-likefeaturesasshowninFig.2.
 
Search WWH ::




Custom Search