Information Technology Reference
In-Depth Information
Figure 2. Frequency and time summarization in feature extraction
a summarization step a set numbers (the feature
vector) is calculated. The feature vector attempts
to summarize/capture the content information of
that short slice in time. A piece of music can then
be represented as a sequence of feature vectors.
By detecting abrupt changes in the trajectory of
the feature vectors segmentation can be performed
and by detecting regions in feature space classi-
fication can be performed. Most audio features
are extracted in three stages: (1) spectrum cal-
culation, (2) frequency-domain summarization,
(3) time-domain summarization. In spectrum
calculation, a short-time slice (typically around
10 to 40 milliseconds) of waveform samples is
transformed to a frequency domain representa-
tion. The most common such transformation is
the Short Time Fourier Transform (STFT). Dur-
ing each short-time slice the signal is assumed
to be approximately stationary and is windowed
to reduce the effect of discontinuities at the start
and end of the frame. This frequency domain
transformation preserves all the information in
the signal and therefore the resulting spectrum
has high dimensionality. For analysis purposes,
it is necessary to find a more succinct description
that has significantly lower dimensionality while
still retaining the desired content information.
Frequency domain summarization converts the
high dimensional spectrum (typically 512 or 1024
coefficients) to a smaller set of number features
(typically 10-30). A common approach is to use
Search WWH ::




Custom Search