Digital Signal Processing Reference
In-Depth Information
Strictly speaking, this is a cross-correlation and the result is not axis-symmetrical.
Further, negative values may result, as opposed to the normal ACF. It is, however,
better suited in the case of short analysis windows as in this case the effect of fading is
particularly significant for the stationary approach. Overall, however, the stationary
approach is preferred.
6.2.1.4 Spectrum and Cepstrum
With the speech and most audio signals generally being a non-stationary process that
can be considered 'quasi-stationary' only for short time periods, one determines short
time spectra instead of transforming the whole signal into the spectral domain [ 2 , 18 ].
From the time signal s
we can determine
the short time spectrum at time k with n as variable for the Fourier transformation.
The short time spectrum by that is a function of time n and frequency m .
With the DFT given as
(
k
)
with a suitable window function w
(
k
)
N
1
e j 2 π mk
S
(
m
) =
s
(
k
)
,
(6.24)
N
k
=
0
the complex short time spectrum S
(
m
,
n
)
is obtained by [ 3 ]:
N
2
n
+
1
e j 2 π mk
S
(
m
,
n
) =
s
(
k
)
w
(
n
k
1
)
.
(6.25)
N
2
k
=
n
Note that, implementation wise the Fast Fourier Transform (FFT)—is commonly
used for DFT calculation.
To improve readability, in the following consideration we switch back to an ana-
logue frequency description with f as the continuous frequency; still, the described
concept is valid also for the discrete time and frequency domain.
According to the simplified linear source filter model of speech production, the
speech signal can be modelled by the convolution of the excitation/source signal
E
(
f
)
with,
the excitation transfer function G
(
f
)
,
the transfer function of the vocal tract H
(
f
)
,
and a transfer function R
, which describes the sound wave propagation into the
space outside the human body
(
f
)
weighted by an amplitude factor A [ 6 , 19 ].
If the influence of the source is to be eliminated, a deconvolution of the source
and the transfer functions is required. This can be easily achieved in the frequency
domain where the convolution is expressed as product of the signal and all transfer
functions:
 
Search WWH ::




Custom Search