Audio Features - Intelligent Audio Analysis

Digital Signal Processing Reference

In-Depth Information

1 000 Hz, but decreases thereafter. PLP overcomes this by remapping the frequency

axis according to the Bark scale and integrating the energy in the critical bands for

a critical-band spectrum approximation.

•

Equal-loudness hearing curve : To simulate human hearing's higher sensitivity

to the middle frequency range of the audible spectrum at normal conversational

speech sound pressure levels, the critical-band spectrum is multiplied by an equal

loudness curve that suppresses frequency ranges that are either relatively low or

relatively high in comparison to the range from 400 to 1 200 Hz.

•

Intensity-loudness power law of hearing : The non-linear relation of a sound's

physical intensity and its human perceived loudness sensation is approximated by

the power-law of hearing. A cube-root amplitude-compression of the loudness-

equalised critical band spectrum estimate is applied.

The psychoacoustically derived spectrum shows less detail and is characterised by a

smaller dynamic range. This allows for good modelling by a low-order all-pole model

to weaken speaker characteristics: After estimation of the auditory-like spectrum it

is converted to ACF values. Then, the autocorrelations are input to a standard LPC

analysis, to output PLP coefficients [ 22 ]. These can be further be converted to cepstral

coefficients by standard recursion (Eq. 6.52 ).

Interestingly, PLP allows for a smaller order as compared to LP coefficients. This

reduces the number of features and by that the parameters needed in a learning

algorithm.

A further variant are RASTA (RelAtive SpecTrA) PLP coefficients [ 23 ]. These

aim at easing mismatches between training and testing data's recording conditions

by linear filtering of the data. In the RASTA method, a bandpass filter is applied per

spectral component in the critical band spectrum estimate to emphasise modulations

in the range of the speech syllable rate. By that, frame-to-frame spectral changes

between 1 and 10 Hz are emphasised by the following filter:

z − 1

z − 3

2 z − 4

2

+

−

H

(

z

) =

0

.

1

·

.

(6.55)

z − 4

98 z − 1

· (

1

−

0

.

)

The authors in [ 23 ] stress, however that other filters could be used and that these

could be adapted to the frequency.

The idea behind the RASTA method is that speech is modulated at a different rate

as compared to channel effects, background noise, or non-linguistic vocalisations.

Moreover, human hearing seems to be less sensitive to slowly varying stimuli [ 22 ].

In detail, the processing steps for RASTA PLP cepstral coefficients are: (1) DFT,

(2) logarithm, (3) RASTA band-pass filtering, (4) equal loudness curve, (5) power-

law of hearing, (6) inverse logarithm, (7) inverse DFT, (8) solving linear equations

for LPC, (9) cepstral recursion.

Intelligent Audio Analysis

Search WWH ::

Custom Search

Home