Audio Features - Intelligent Audio Analysis

Digital Signal Processing Reference

In-Depth Information

Strictly speaking, this is a cross-correlation and the result is not axis-symmetrical.

Further, negative values may result, as opposed to the normal ACF. It is, however,

better suited in the case of short analysis windows as in this case the effect of fading is

particularly significant for the stationary approach. Overall, however, the stationary

approach is preferred.

6.2.1.4 Spectrum and Cepstrum

With the speech and most audio signals generally being a non-stationary process that

can be considered 'quasi-stationary' only for short time periods, one determines short

time spectra instead of transforming the whole signal into the spectral domain [ 2 , 18 ].

From the time signal s

we can determine

the short time spectrum at time k with n as variable for the Fourier transformation.

The short time spectrum by that is a function of time n and frequency m .

With the DFT given as

(

)

with a suitable window function w

(

)

−

e − j 2 π mk

(

) =

(

)

(6.24)

the complex short time spectrum S

(

)

is obtained by [ 3 ]:

−

e − j 2 π mk

(

) =

(

)

(

−

)

(6.25)

−

Note that, implementation wise the Fast Fourier Transform (FFT)—is commonly

used for DFT calculation.

To improve readability, in the following consideration we switch back to an ana-

logue frequency description with f as the continuous frequency; still, the described

concept is valid also for the discrete time and frequency domain.

According to the simplified linear source filter model of speech production, the

speech signal can be modelled by the convolution of the excitation/source signal

(

)

with,

•

the excitation transfer function G

(

)

•

the transfer function of the vocal tract H

(

)

•

and a transfer function R

, which describes the sound wave propagation into the

space outside the human body

(

)

weighted by an amplitude factor A [ 6 , 19 ].

If the influence of the source is to be eliminated, a deconvolution of the source

and the transfer functions is required. This can be easily achieved in the frequency

domain where the convolution is expressed as product of the signal and all transfer

functions:

Intelligent Audio Analysis

Search WWH ::

Custom Search

Home