Digital Signal Processing Reference
In-Depth Information
(
) =
(
) ·
(
) ·
(
) ·
(
) ·
.
S
f
E
f
G
f
H
f
R
f
A
(6.26)
In the logarithmic domain, this product turns into a summation. The signal part that
is owed to E
can be eliminated by high- or band-pass filtering. In the case of
high-pass filtering this requires that these parts are indeed low-frequent, in order
not to cut away formants (cf. Sect. 6.2.1.8 ). This high-pass can be best realised on
the back-transformation of the logarithmised powers of the spectrum into the time-
domain. This leads to the so-called cepstrum, with the independent variable d ,the
'quefrency' [ 7 ]. These names have been artificially created from the terms 'spec-
trum' and 'frequency' by re-ordering of characters. The variable d is a unit of time
that corresponds to the delay in the ACF, which is the reason for the choice of the
same identifier. By applying the logarithm to the power spectrum, the product rela-
tionship of the source signal and the transfer functions turns into a sum relationship.
After the back-transformation to the time domain (i.e., in the cepstrum) the additive
concatenation of the linear source filter model components remains [ 2 ]:
(
f
)
2
x
(
d
) =
IDFT
[
log
|
S
(
f
) |
]
(6.27)
2
2
2
2
2
=
[
|
(
) |
+
|
(
) |
+
|
(
) |
+
|
(
) |
+
|
|
]
(6.28)
IDFT
log
E
f
log
G
f
log
H
f
log
R
f
log
A
=
e
(
d
) +
g
(
d
) +
h
(
d
) +
r
(
d
) +
A
,
(6.29)
where (I)DFT is the (Inverse) Discrete Fourier Transformation, and e
(
d
)
, g
(
d
)
, h
(
d
)
,
and r
(
d
)
are the equivalents of their capitalised frequency domain counterparts E
(
f
)
,
G
, etc. The cepstrum is real valued, if computed from the amplitude or power spec-
trum, as these are both axis-symmetrical [ 6 ]. The desired high-pass can be obtained
by trimming the cepstrum after the first fundamental period, i.e., at T 0 .
Variations of the classical cepstrum use other back-transformations such as the
Discrete Cosine Transformation (DCT) or PCA for de-correlation.
If one maps the power spectrum onto Mel-frequency scale bands, then takes the
logarithms of the powers of each band, and applies a DCT transformation to the
resulting values, one obtains the Mel-frequency cepstral coefficients (MFCCs). The
mapping onto Mel-frequency scale bands is typically performed by triangular filters
which are equidistantly spaced on the Mel-frequency scale. This scale takes the
physiology of human hearing into account: the frequency resolution of the human
ear is higher for low frequencies and lower for high frequencies; an approximately
logarithmic relationship of the frequency resolution to the absolute frequency exists
[ 5 ]. The Mel-frequency scale Mel
(
f
)
(
f
)
is given by:
log 1
f
700
Mel
(
f
) =
2595
·
+
.
(6.30)
MFCCs are among the most popular audio features. Usually coefficients 0 up to 16
are used. For speech recognition in particular, coefficients 0-12 are applied most
frequently.
 
Search WWH ::




Custom Search