Digital Signal Processing Reference
In-Depth Information
1 000 Hz, but decreases thereafter. PLP overcomes this by remapping the frequency
axis according to the Bark scale and integrating the energy in the critical bands for
a critical-band spectrum approximation.
Equal-loudness hearing curve : To simulate human hearing's higher sensitivity
to the middle frequency range of the audible spectrum at normal conversational
speech sound pressure levels, the critical-band spectrum is multiplied by an equal
loudness curve that suppresses frequency ranges that are either relatively low or
relatively high in comparison to the range from 400 to 1 200 Hz.
Intensity-loudness power law of hearing : The non-linear relation of a sound's
physical intensity and its human perceived loudness sensation is approximated by
the power-law of hearing. A cube-root amplitude-compression of the loudness-
equalised critical band spectrum estimate is applied.
The psychoacoustically derived spectrum shows less detail and is characterised by a
smaller dynamic range. This allows for good modelling by a low-order all-pole model
to weaken speaker characteristics: After estimation of the auditory-like spectrum it
is converted to ACF values. Then, the autocorrelations are input to a standard LPC
analysis, to output PLP coefficients [ 22 ]. These can be further be converted to cepstral
coefficients by standard recursion (Eq. 6.52 ).
Interestingly, PLP allows for a smaller order as compared to LP coefficients. This
reduces the number of features and by that the parameters needed in a learning
algorithm.
A further variant are RASTA (RelAtive SpecTrA) PLP coefficients [ 23 ]. These
aim at easing mismatches between training and testing data's recording conditions
by linear filtering of the data. In the RASTA method, a bandpass filter is applied per
spectral component in the critical band spectrum estimate to emphasise modulations
in the range of the speech syllable rate. By that, frame-to-frame spectral changes
between 1 and 10 Hz are emphasised by the following filter:
z 1
z 3
2 z 4
2
+
H
(
z
) =
0
.
1
·
.
(6.55)
z 4
98 z 1
· (
1
0
.
)
The authors in [ 23 ] stress, however that other filters could be used and that these
could be adapted to the frequency.
The idea behind the RASTA method is that speech is modulated at a different rate
as compared to channel effects, background noise, or non-linguistic vocalisations.
Moreover, human hearing seems to be less sensitive to slowly varying stimuli [ 22 ].
In detail, the processing steps for RASTA PLP cepstral coefficients are: (1) DFT,
(2) logarithm, (3) RASTA band-pass filtering, (4) equal loudness curve, (5) power-
law of hearing, (6) inverse logarithm, (7) inverse DFT, (8) solving linear equations
for LPC, (9) cepstral recursion.
Search WWH ::




Custom Search