Digital Signal Processing Reference
In-Depth Information
approximately centered at some frame. On each frame a window is applied to
taper the signal towards the frame boundaries. Generally, Hanning or Hamming
windows are used [ 1 ]. This is done to enhance the harmonics, smooth the edges
and to reduce the edge effect while taking the DFT on the signal.
3. DFT spectrum: Each windowed frame is converted into magnitude spectrum
by applying DFT.
j2pnk
N
X ð k Þ¼ X
N 1
ð A : 2 Þ
x ð n Þ e
;
0 k N 1
n ¼ 0
where N is the number of points used to compute the DFT.
4. Mel-spectrum: Mel-Spectrum is computed by passing the Fourier transformed
signal through a set of band-pass filters known as mel-filter bank. A mel is a
unit of measure based on the human ears perceived frequency. It does not
correspond linearly to the physical frequency of the tone, as the human auditory
system apparently does not perceive pitch linearly. The mel scale is
approximately a linear frequency spacing below 1 kHz, and a logarithmic
spacing above 1 kHz [ 4 ]. The approximation of mel from physical frequency
can be expressed as
f
700
f mel ¼ 2595 log 10
1 þ
ð A : 3 Þ
where f denotes the physical frequency in Hz, and f mel denotes the perceived
frequency [ 2 ].
Filter banks can be implemented in both time domain and frequency domain.
For MFCC computation, filter banks are generally implemented in frequency
domain. The center frequencies of the filters are normally evenly spaced on the
frequency axis. However, in order to mimic the human ears perception, the
warped axis according to the non-linear function given in Eq. ( A.3 ), is
implemented. The most commonly used filter shaper is triangular, and in some
cases the Hanning filter can be found [ 1 ]. The triangular filter banks with mel-
frequency warping is given in Fig. A.1 .
The mel spectrum of the magnitude spectrum X ð k Þ is computed by multiplying
the magnitude spectrum by each of the of the triangular mel weighting filters.
h
i ;
s ð m Þ¼ X
N 1
j 2 H m ð k Þ
j
X ð k Þ
0 m M 1
ð A : 4 Þ
k ¼ 0
where M is total number of triangular mel weighting filters [ 5 , 6 ]. H m ð k Þ is the
weight given to the kth energy spectrum bin contributing to the mth output band
and is expressed as :
Search WWH ::




Custom Search