Database Reference
In-Depth Information
7.4.2
Object Segmentation
Graph Cut solves for the cut that minimizes the energy of an object/background
pixel assignment A :
)= ʻ p P R p ( A p )+
E
(
A
B ( p , q )
(7.21)
(
p
,
q
)
N
A p =
A q
where P represents all image pixels, N all unordered neighborhood pixel pairs, A
is a binary vector whose components A p specify assignments to pixel p in P , and
ʻ
specifies the relative importance of the first and the second terms. R p is the prior
probability, represented by how likely the pixel is to be of object or background
label A in the set of pixels P . B ( p , q ) is the pairwise edge energy represented by the
probability that there exists an edge between the pair of pixels
(
p
,
q
)
in the set of
neighborhood pixels N .
To calculate R p , the pixel values are compared with the color histogram to
calculate how similar the pixel is to the user seeded region. To calculate B ( p , q ) ,
the pairwise pixel luminance difference I p
I q is mapped to an exponential function
representing how likely is there to be an edge between the pair, shown in Eq. ( 7.22 ).
exp
2
(
I p
I q )
1
·
B ( p , q )
(7.22)
2
˃
2
p
,
q
In the calculation of pairwise luminance difference, a pixel can be connected
to 4 or 8 of its immediate neighbors. The number of connections is set to reduce
processing complexity, and it is possible to connect a pixel not only to its adjacent
pixels, but also nearby pixels or pixels which are frames away.
Graph Cut can easily be extended for videos using a 3-D graph structure. The
temporal information added allows it to track a segmented object effectively. Most
implementations of Graph Cut require the user to seed the image by indicating the
desired region versus the background (in our implementation, blue and red strokes,
respectively). The unique strength of Graph Cut is that when given the connected
graph topology, it can propagate the information across the sequence of images.
One of the main problems in using the Graph Cut algorithm in 3-D for video
segmentation is in the growing complexity of the number of nodes and edges. Given
a graph G
, where N n is the number of nodes and N e is the number of edges,
which are both proportional to the number of pixels, the relationships of N e =
(
N n ,
N e )
4 N n
and N e =
6 N n hold for a 4-connected and 8-connected neighbor graph, respectively
(Fig. 7.10 ). In 3-D, connecting immediately to the neighboring temporal frames will
increase to N e =
6 N n and N e =
26 N n for 4 or 8-connected neighbors. With a 720
×
10 6 edges. Given that
the Graph Cut algorithm's polynomial run time depends on N n and N e , using Graph
Cut in 3-D is still quite infeasible at the pixel level. Unfortunately, the technique
does not scale well with resolution, especially for videos.
10 6 nodes and 207
480
×
100 pixel video, there will be 34
×
×
Search WWH ::




Custom Search