Digital Signal Processing Reference
In-Depth Information
trained decision is made to assure that the audio onset belongs to the type of signal
one is interested in. Some standard methods are found in [ 10 - 14 ].
6.2 Audio Low Level Descriptors
This section introduces a variety of important acoustic low-level descriptors (LLDs)
which are commonly used in the fields of speech, music, and general sound analysis.
The following description of audio LLDs is based on the assumption of digitised
audio. By that, the signal is represented as s
in the discrete time domain with
the discrete time index k as the index of the k - th sample. Further, the sampling of
parameters by windowing of the signal requires the use of a second time variable: a
time n for the instant of measurement of parameters over a window of length N (see
Sect. 6.1 for details on digital audio signal representations and windowing).
(
k
)
6.2.1 Speech Descriptors
Among the most important descriptors for speech signals are the intensity, the fun-
damental frequency F 0 together with the probability of voiced/unvoiced speech, the
formants, i.e., resonance frequencies F X of the vocal tract, with X typically between
1 and 7, together with anti-formants. Further, the voice quality parameters jitter
and shimmer are often of interest—these are micro perturbations of the fundamen-
tal frequency period lengths and intensities, respectively. Parameters describing the
structure of the spectrogram are thereby particularly coined by the characteristics of
the vocal tract.
6.2.1.1 Intensity
Rather than modelling the psycho-acoustically perceived intensity which usually
depends on the energy, pitch, duration, and the spectral shape of a stimulus [ 15 ],
just the physical energy E of the signal s
(
k
)
is used as a approximate measure of
intensity. It is defined as [ 16 ]:
+∞
s 2
E
=
(
k
).
(6.14)
k
=−∞
With short time analysis, the energy E
(
n
)
at time n is determined as
N
2
n
+
1
2
E
(
n
) =
[
s
(
k
)
w
(
n
k
1
) ]
,
(6.15)
2
k
=
n
 
Search WWH ::




Custom Search