Digital Signal Processing Reference
In-Depth Information
c i 1 = βp ( ph i |
ph i )
(6.3)
c i 2 = α (1
c i 1 )
(6.4)
c i 2 =(1
c i 1 )
α )(1
(6.5)
c ij =0for j ≥ 4
(6.6)
By using C , each phoneme can be only confused with the two most likely
phonemes. Additionally, the factor β allow to modify the confusion level
among phonemes. For the case that βp ( ph i |
1then c i 1 =1.
For setting T conf , two modalities are investigated. In the first option, T conf
is set to a constant value. In the second option, T conf (notated as T conf ) varies
depending on the phoneme emitted. If q t = ph i , then:
ph i )
l i
λ
T conf =
(6.7)
where l i is the average phoneme length in frames of ph i and λ is a constant
value. By setting T conf depending on l i , we avoid that long periods affect
short phonemes, which would result in high phoneme deletions.
6.1.4 Windowing
A windowing module is introduced for smoothing posterior probabilities dur-
ing phoneme transitions. In this topic, the impulse response of this module
is:
1
w ( t )=
w ( k ) δ ( t
k )
(6.8)
k = 1
where w 1 = w 1 =0 . 15 and w 0 =0 . 7. w ( t ) has been determined heuristically.
The output is obtained by convolution:
x i ( t )= w ( t )
x i ( t )
(6.9)
6.1.5 MIMO Channel
A Multiple Input Multiple Output (MIMO) channel [Linder 05] is used, which
is a linear time invariant system representing a simplified model of the in-
teraction among phonemes. The impulse response of the channel is given
by:
h 11 ( t ) h 12 ( t ) ... h 1 n ( t )
h 21 ( t ) h 22 ( t ) ... h 2 n ( t )
... ... ... ...
h n 1 ( t ) h n 2 ( t ) ... h nn ( t )
H ( t )=
(6.10)
and the received signal is obtained by convolution:
 
Search WWH ::




Custom Search