Database Reference
In-Depth Information
( ʸ )
argmax£
. The L-BFGS algorithm is chosen due to this method's efficiency and
performance from both theory [ 298 ] and application [ 279 ].
During the optimization process, the conditional probability in Eq. ( 9.8 )is
substituted by the explicit form in Eq. ( 9.4 ) to get Eq. ( 9.9 ). Then, partial derivatives
of a training sample £ i
1
k
2
k
( ʸ )
with respect to
ʸ
and
ʸ
are derived in Eqs. ( 9.10 )
and ( 9.11 ), respectively.
1
( ʸ )= i
2
£
log p
(
y i |
X i , ʸ )
2 ʸ
(9.8)
2
ʴ
log 1
Z
1
( ʸ )= i
ʸ ) h
e ˈ ( y i , h , X i )
2
£
2 ʸ
(9.9)
(
X i ;
2
ʴ
( ʸ )
∂ʸ
£ i
k = t
f k (
P
(
h t |
y i ,
X i )
y i ,
h t ,
X i )
1
t , y
y |
f k (
y ,
P
(
h t ,
X i )
h t ,
X i )
(9.10)
( ʸ )
∂ʸ
£ i
k = t
f k (
P
(
h t 1 ,
h t |
y i ,
X i )
y i ,
h t 1 ,
h t ,
X i )
2
t , y
y |
f k (
y ,
P
(
h t 1 ,
h t ,
X i )
h t 1 ,
h t ,
X i )
(9.11)
9.3.3.2
Comparison with Conditional Random Field (CRF) and Hidden
Markov Model (HMM)
For comparison purposes, we also utilized conventional CRF models as depicted
in Fig. 9.4 b. By following definitions in [ 299 ], the conditional probability function
is shown in Eq. ( 9.12 ), with the normalization factor in Eq. ( 9.13 ). The potential
function is defined in Eq. ( 9.14 ), where v j (
Y t 1 ,
Y t ,
x
)
is a transition feature function
between state positions t and t
1 within the observation sequence; while s k (
Y t ,
x
)
is a state feature function at state position t . Parameters
ʻ j and
μ k are estimated for
transition and state feature functions, respectively.
ex p
1
t = 1 F ( Y , x , t )
(
|
)=
) ·
P
Y
x
(9.12)
Z
(
x
ex p
)= Y
t = 1 F ( Y , x , t )
Z
(
x
(9.13)
)= j ʻ j v j ( Y t 1 , Y t , x )+ k μ k s k ( Y t , x )
F
(
Y
,
x
,
t
(9.14)
 
Search WWH ::




Custom Search