Digital Signal Processing Reference
In-Depth Information
c
i
1
=
βp
(
ph
i
|
ph
i
)
(6.3)
c
i
2
=
α
(1
c
i
1
)
−
(6.4)
c
i
2
=(1
c
i
1
)
−
α
)(1
−
(6.5)
c
ij
=0for
j ≥
4
(6.6)
By using
C
, each phoneme can be only confused with the two most likely
phonemes. Additionally, the factor
β
allow to modify the confusion level
among phonemes. For the case that
βp
(
ph
i
|
1then
c
i
1
=1.
For setting
T
conf
, two modalities are investigated. In the first option,
T
conf
is set to a constant value. In the second option,
T
conf
(notated as
T
conf
) varies
depending on the phoneme emitted. If
q
t
=
ph
i
, then:
ph
i
)
≥
l
i
λ
T
conf
=
(6.7)
where
l
i
is the average phoneme length in frames of
ph
i
and
λ
is a constant
value. By setting
T
conf
depending on
l
i
, we avoid that long periods affect
short phonemes, which would result in high phoneme deletions.
6.1.4 Windowing
A windowing module is introduced for smoothing posterior probabilities dur-
ing phoneme transitions. In this topic, the impulse response of this module
is:
1
w
(
t
)=
w
(
k
)
δ
(
t
−
k
)
(6.8)
k
=
−
1
where
w
−
1
=
w
1
=0
.
15 and
w
0
=0
.
7.
w
(
t
) has been determined heuristically.
The output is obtained by convolution:
x
i
(
t
)=
w
(
t
)
∗
x
i
(
t
)
(6.9)
6.1.5 MIMO Channel
A Multiple Input Multiple Output (MIMO) channel [Linder 05] is used, which
is a linear time invariant system representing a simplified model of the in-
teraction among phonemes. The impulse response of the channel is given
by:
⎡
⎣
⎤
⎦
h
11
(
t
)
h
12
(
t
)
... h
1
n
(
t
)
h
21
(
t
)
h
22
(
t
)
... h
2
n
(
t
)
... ... ... ...
h
n
1
(
t
)
h
n
2
(
t
)
... h
nn
(
t
)
H
(
t
)=
(6.10)
and the received signal is obtained by convolution:
Search WWH ::
Custom Search