Database Reference
In-Depth Information
(
ʸ
)
argmax£
. The L-BFGS algorithm is chosen due to this method's efficiency and
performance from both theory [
298
] and application [
279
].
During the optimization process, the conditional probability in Eq. (
9.8
)is
substituted by the explicit form in Eq. (
9.4
) to get Eq. (
9.9
). Then, partial derivatives
of a training sample
£
i
1
k
2
k
(
ʸ
)
with respect to
ʸ
and
ʸ
are derived in Eqs. (
9.10
)
and (
9.11
), respectively.
1
(
ʸ
)=
∑
i
2
£
log
p
(
y
i
|
X
i
,
ʸ
)
−
2
ʸ
(9.8)
2
ʴ
log
1
Z
1
(
ʸ
)=
∑
i
ʸ
)
h
e
ˈ
(
y
i
,
h
,
X
i
;ʸ
)
2
£
−
2
ʸ
(9.9)
(
X
i
;
2
ʴ
∂
(
ʸ
)
∂ʸ
£
i
k
=
∑
t
f
k
(
P
(
h
t
|
y
i
,
X
i
)
y
i
,
h
t
,
X
i
)
1
−
t
,
y
y
|
f
k
(
y
,
P
(
h
t
,
X
i
)
h
t
,
X
i
)
(9.10)
∂
(
ʸ
)
∂ʸ
£
i
k
=
∑
t
f
k
(
P
(
h
t
−
1
,
h
t
|
y
i
,
X
i
)
y
i
,
h
t
−
1
,
h
t
,
X
i
)
2
−
t
,
y
y
|
f
k
(
y
,
P
(
h
t
−
1
,
h
t
,
X
i
)
h
t
−
1
,
h
t
,
X
i
)
(9.11)
9.3.3.2
Comparison with Conditional Random Field (CRF) and Hidden
Markov Model (HMM)
For comparison purposes, we also utilized conventional CRF models as depicted
in Fig.
9.4
b. By following definitions in [
299
], the conditional probability function
is shown in Eq. (
9.12
), with the normalization factor in Eq. (
9.13
). The potential
function is defined in Eq. (
9.14
), where
v
j
(
Y
t
−
1
,
Y
t
,
x
)
is a transition feature function
between state positions
t
and
t
−
1 within the observation sequence; while
s
k
(
Y
t
,
x
)
is a state feature function at state position
t
. Parameters
ʻ
j
and
μ
k
are estimated for
transition and state feature functions, respectively.
ex p
1
t
=
1
F
(
Y
,
x
,
t
)
(
|
)=
)
·
P
Y
x
(9.12)
Z
(
x
ex p
)=
Y
t
=
1
F
(
Y
,
x
,
t
)
Z
(
x
(9.13)
)=
∑
j
ʻ
j
v
j
(
Y
t
−
1
,
Y
t
,
x
)+
k
μ
k
s
k
(
Y
t
,
x
)
F
(
Y
,
x
,
t
(9.14)
Search WWH ::
Custom Search