Digital Signal Processing Reference
In-Depth Information
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
500
1000
1500
2000
2500
3000
3500
4000
Frequency (Hz)
Fig. A.1
Mel-filter bank
8
<
0 ;
k\f ð m 1 Þ
ð ð Þ
f ð m Þ f ð m 1 Þ ;
2 k fm 1
f ð m 1 Þ k f ð m Þ
H m ð k Þ¼
ð A : 5 Þ
:
2 f ð m þ 1 Þð Þ
f ð m þ 1 Þ f ð m Þ ;
f ð m Þ \k f ð m þ 1 Þ
0 ;
k [ f ð m þ 1 Þ
with m ranging from 0 to M 1.
5. Discrete Cosine Transform (DCT): Since the vocal tract is smooth, the energy
levels in adjacent bands tend to be correlated. The DCT is applied to the
transformed mel frequency coefficients produces a set of cepstral coefficients.
Prior to computing DCT the mel spectrum is usually represented on a log scale.
This results in a signal in the cepstral domain with a que-frequency peak
corresponding to the pitch of the signal and a number of formants representing
low que-frequency peaks. Since most of the signal information is represented
by the first few MFCC coefficients, the system can be made robust by extracting
only those coefficients ignoring or truncating higher order DCT components
[ 1 ]. Finally, MFCC is calculated as [ 1 ]
;
c ð n Þ¼ X
M 1
pn ð m 0 : 5 Þ
M
log 10 s ð m Þ
ð
Þ cos
n ¼ 0 ; 1 ; 2 ; :::; C 1
ð A : 6 Þ
m ¼ 0
where c ð n Þ are the cepstral coefficients and C is the number of MFCCs.
Traditional MFCC systems use only 8-13 cepstral coefficients. The zeroth
Search WWH ::




Custom Search