Indexing, Object Segmentation, and Event Detection in News and Sports Videos - Multimedia Database Retrieval: Technology and Applications

Database Reference

In-Depth Information

7.4.2

Object Segmentation

Graph Cut solves for the cut that minimizes the energy of an object/background

pixel assignment A :

)= ʻ p ∈ P R p ( A p )+

∑

(

B ( p , q )

(7.21)

(

) ∈

A p =

A q

where P represents all image pixels, N all unordered neighborhood pixel pairs, A

is a binary vector whose components A p specify assignments to pixel p in P , and

specifies the relative importance of the first and the second terms. R p is the prior

probability, represented by how likely the pixel is to be of object or background

label A in the set of pixels P . B ( p , q ) is the pairwise edge energy represented by the

probability that there exists an edge between the pair of pixels

(

)

in the set of

neighborhood pixels N .

To calculate R p , the pixel values are compared with the color histogram to

calculate how similar the pixel is to the user seeded region. To calculate B ( p , q ) ,

the pairwise pixel luminance difference I p −

I q is mapped to an exponential function

representing how likely is there to be an edge between the pair, shown in Eq. ( 7.22 ).

exp

− (

I p −

I q )

B ( p , q ) ∝

(7.22)

In the calculation of pairwise luminance difference, a pixel can be connected

to 4 or 8 of its immediate neighbors. The number of connections is set to reduce

processing complexity, and it is possible to connect a pixel not only to its adjacent

pixels, but also nearby pixels or pixels which are frames away.

Graph Cut can easily be extended for videos using a 3-D graph structure. The

temporal information added allows it to track a segmented object effectively. Most

implementations of Graph Cut require the user to seed the image by indicating the

desired region versus the background (in our implementation, blue and red strokes,

respectively). The unique strength of Graph Cut is that when given the connected

graph topology, it can propagate the information across the sequence of images.

One of the main problems in using the Graph Cut algorithm in 3-D for video

segmentation is in the growing complexity of the number of nodes and edges. Given

a graph G

, where N n is the number of nodes and N e is the number of edges,

which are both proportional to the number of pixels, the relationships of N e =

(

N n ,

N e )

4 N n

and N e =

6 N n hold for a 4-connected and 8-connected neighbor graph, respectively

(Fig. 7.10 ). In 3-D, connecting immediately to the neighboring temporal frames will

increase to N e =

6 N n and N e =

26 N n for 4 or 8-connected neighbors. With a 720

10 6 edges. Given that

the Graph Cut algorithm's polynomial run time depends on N n and N e , using Graph

Cut in 3-D is still quite infeasible at the pixel level. Unfortunately, the technique

does not scale well with resolution, especially for videos.

10 6 nodes and 207

480

100 pixel video, there will be 34

Multimedia Database Retrieval: Technology and Applications

Search WWH ::

Custom Search

Home