Mobile Landmark Recognition - Multimedia Database Retrieval: Technology and Applications

Database Reference

In-Depth Information

(

)

(

)

where S

2 are used to

adjust the relative compactness of the saliency values. Equation ( 5.3 ) results in the

concentration of the activation into a few key locations.

is the saliency map at location

, a

1 and b

5.3

Saliency-Aware Local Descriptor

The SIFT descriptor aims at detecting and describing local visual features in two

steps. In the first step, the key points are localized, while in the second step,

local descriptors are built for each key point. A given image is decomposed into

a set of key points X

= {

x 1 ,...,

x n }

with their corresponding SIFT descriptors

S = {

. In the process of obtaining the descriptors, the gradient vector

for each pixel in the key point's neighborhood is computed and the histogram

of gradient directions is built. Thus, the descriptor can be represented as a set

of gradient histograms, and can be denoted by s

s 1 ,...,

s n }

(

)

, where m

n and o are

respectively the indexes of the spatial bins and orientation channels.

A16

4 pixels each.

For each pixel within a sub-region, the pixel's gradient vector is added to a histogram

of gradient direction by quantizing each orientation to one of eight directions. Each

entry of a bin is further weighted by 1

16 neighborhood is partitioned into 16 sub-regions of 4

d , where d is the geometric distance from

the sample to the bin center. This reduces boundary effects as samples move between

positions and orientations.

In order to incorporate the saliency information into the descriptor, when

calculating the histogram, each entry of a bin is weighted by the saliency weights:

−

1 M o (

)(

−

(

))

(

)

)= ∑

d B (

) <

(

(5.4)

∑ d B ( i , j ) < 1 S

(

)

where M o (

)

represents the gradient magnitude at the location

(

)

in the o -th

orientation plane, d B (

)

is the distance between the sample at

(

)

and the center

of the bin B

(

)

≤

4, and 1

≤

16 pixels, chosen for obtaining the descriptor s .

The saliency value associated with the descriptor is obtained by weighting the

saliency map S

Let R denote a region of size 16

(

)

discussed in Eq. ( 5.3 ) by a Gaussian of scale

as follows:

exp

− (

−

R x )

−

R y )

∑

(

)

(5.5)

(

) ∈

where

(

R x ,

R y )

is the center of the region R .

Multimedia Database Retrieval: Technology and Applications

Search WWH ::

Custom Search

Home