Audio Features - Intelligent Audio Analysis

Digital Signal Processing Reference

In-Depth Information

trained decision is made to assure that the audio onset belongs to the type of signal

one is interested in. Some standard methods are found in [ 10 - 14 ].

6.2 Audio Low Level Descriptors

This section introduces a variety of important acoustic low-level descriptors (LLDs)

which are commonly used in the fields of speech, music, and general sound analysis.

The following description of audio LLDs is based on the assumption of digitised

audio. By that, the signal is represented as s

in the discrete time domain with

the discrete time index k as the index of the k - th sample. Further, the sampling of

parameters by windowing of the signal requires the use of a second time variable: a

time n for the instant of measurement of parameters over a window of length N (see

Sect. 6.1 for details on digital audio signal representations and windowing).

(

k

)

6.2.1 Speech Descriptors

Among the most important descriptors for speech signals are the intensity, the fun-

damental frequency F 0 together with the probability of voiced/unvoiced speech, the

formants, i.e., resonance frequencies F X of the vocal tract, with X typically between

1 and 7, together with anti-formants. Further, the voice quality parameters jitter

and shimmer are often of interest—these are micro perturbations of the fundamen-

tal frequency period lengths and intensities, respectively. Parameters describing the

structure of the spectrogram are thereby particularly coined by the characteristics of

the vocal tract.

6.2.1.1 Intensity

Rather than modelling the psycho-acoustically perceived intensity which usually

depends on the energy, pitch, duration, and the spectral shape of a stimulus [ 15 ],

just the physical energy E of the signal s

(

k

)

is used as a approximate measure of

intensity. It is defined as [ 16 ]:

+∞

s 2

E

=

(

k

).

(6.14)

k

=−∞

With short time analysis, the energy E

(

n

)

at time n is determined as

N

2

n

+

−

1

2

E

(

n

) =

[

s

(

k

)

w

(

n

−

k

−

1

) ]

,

(6.15)

2

k

=

n

−

Intelligent Audio Analysis

Search WWH ::

Custom Search

Home