Information Technology Reference
In-Depth Information
Table 19.3 Applied
functionals of feature set I
Functionals applied to LLD/ LLD
Quartiles 1-3, 3 inter-quartile ranges
1 % percentile ( min), 99 % percentile ( max)
Position of min/max
Percentile range 1-99 %
Arithmetic mean, a root quadratic mean
Contour centroid, flatness
Standard deviation, skewness, kurtosis
Rel. duration LLD is above/below 25/50/75/90 % range
Rel. duration LLD is rising/falling
Rel. duration LLD has positive/negative curvature b
Gain of linear prediction (LP), LP Coefficients 1-5
Mean,max,min,std.dev.ofsegmentlength c
Functionals applied to LLD only
Mean of peak distances
Standard deviation of peak distances
Mean value of peaks
Mean value of peaks—arithmetic mean
Mean/std.dev. of rising/falling slopes
Mean/std.dev. of inter maxima distances
Amplitude mean of maxima/minima
Amplitude range of maxima
Linear regression slope, offset, quadratic error
Quadratic regression a , b , offset, quadratic error
Percentage of non-zero frames d
a
Arithmetic mean of LLD/positive LLD
b
Only applied to voice related LLD
c
Not applied to voice related LLD except F0
d
Only applied to F0
The set includes energy, spectral, cepstral (MFCC) and voicing related low-level
descriptors (LLDs) as well as a few LLDs including logarithmic harmonic-to-noise
ratio (HNR), spectral harmonicity, and psychoacoustic spectral sharpness.
On these LLD a number functionals are applied in order to extract higher level
statistics. These are listed in Table 19.3 . Altogether, the final feature set I contains
6,373 features.
Feature set II is extracted on a frame-wise basis and in order to keep the
amount of features under control only a relatively small set of descriptors is
calculated for each frame. Using the openSMILE toolkit again, we extract low-
level descriptors (LLDs) and functionals every 10 ms adopting a frame size of
20 ms. Specifically, we compute frame-wise logarithmic energy and Mel-frequency
cepstral coefficients (MFCC) 1-12 along with their first and second order delta ( )
regression coefficients as typically used in automatic speech recognition. These are
augmented by voicing probability, HNR, F0 and zero-crossing rate, as well as their
 
Search WWH ::




Custom Search