Digital Signal Processing Reference
In-Depth Information
Table 1.3 Classification performance; normalized accuracy (%)
Front-end
Expolog
LPC
Expolog
DCT
20Bands
LPC
20Bands
DCT
20Bands
DCT11
NS
MFCC
PLP
None
83.7
83.1
81.4
81.9
84.2
84.1
83.6
FWSS
85.6
85.1
86.2
85.4
83.5
87.6
88.2
89.0
None
FWSS
86.0
83.0
80.0
77.0
Front-End
Fig. 1.7 Front-end's classification performance
and cepstra extracted from a uniform filterbank of 20 non-overlapping rectangular
filters distributed on a linear frequency scale (20Bands) [ 15 ] were compared. MFCC
represent a common baseline front-end in speech/speaker recognition, and PLP has
been shown by numerous studies to provide comparable or better performance to
MFCC in various speech-related tasks [ 14 ].
Expolog is an outcome of studies on accent classification and stressed speech
recognition, and features based on 20Bands filterbank have shown superior
properties in noisy neutral and Lombard speech recognition [ 15 ].
In this study, Expolog and 20Bands filterbanks were used either as a replacement
for the triangular Mel filterbank in MFCC, yielding front-ends denoted Expolog
DCT and 20Bands DCT, or as a replacement for PLP trapezoid Bark filterbank,
yielding setups denoted Expolog LPC and 20Bands LPC. In order to reduce the
impact of strong background noise on classification, Full Wave Spectral Subtraction
(FWSS) utilizing Burg's cepstral-based voice activity detector [ 14 ] was incorporated
in the feature extraction. The classification results are summarized in Table 1.3 and
Fig. 1.7 . The first row of results in Table 1.3 represents the performance of a classifier
without noise subtraction (NS), denoted “none.”
It can be seen that in the majority of cases, FWSS considerably improves
performance. Among front-ends employing 13 static coefficients and their first-and
Search WWH ::




Custom Search