Digital Signal Processing Reference
In-Depth Information
2
4
cross-correlation
squared error
2
cross-correlation
0
squared error
0
−
2
−
2
original speech
−
4
original speech
−
6
−
4
8
synthesized speech
−
synthesized speech
−
6
−
10
0
200
400
600
0
200
400
600
samples
samples
(a) Stationary voiced speech
(b) Transitory speech
1
3
cross-correlation
squared error
2
squared error
LPC residual
0
1
cross-correlation
0
−
1
LPC residual
−
1
LPC excitation
LPC excitation
−
2
−
2
0
100
200
300
0
100
200
300
samples
samples
(c) Stationary voiced LPC residual
(d) Transitory LPC residual
Figure 9.23
Squared error,
E
i
,
E
i
r
, and cross-correlation,
R
i
,
R
i
r
,values
In order to estimate the normalized residual cross-correlation,
R
i
r
,and
residual squared error,
E
i
r
, equations (9.52) and (9.53) are repeated with
s(n)
and
r(n)
respectively. Figure 9.23 depicts
E
i
,
R
i
,
original speech
s(n)
, and synthesized speech
ˆ
s(n)
replaced by
r(n)
and
ˆ
s(n)
.
E
i
and
R
i
are aligned with
the corresponding pitch cycles of the speech waveforms, and the speech
waveforms are shifted down for clarity. Examples of the residual domain
signals, LPC residual
r(n)
, LPC excitation
ˆ
r(n)
,
E
i
r
,and
R
i
r
are also shown in
ˆ
the figure.
For stationary voiced speech, the squared error,
E
i
, is usually much lower
than unity and the normalized cross-correlation,
R
i
, is close to unity. How-
ever, the harmonic model fails at the transitions, which results in larger errors
and lower correlation values. The estimated normalized cross-correlation and
Search WWH ::
Custom Search