MARSYAS-0.2: A Case Study in Implementing Music Information Retrieval Systems - Intelligent Music Information Systems: Tools and Methodologies

Information Technology Reference

In-Depth Information

Figure 2. Frequency and time summarization in feature extraction

a summarization step a set numbers (the feature

vector) is calculated. The feature vector attempts

to summarize/capture the content information of

that short slice in time. A piece of music can then

be represented as a sequence of feature vectors.

By detecting abrupt changes in the trajectory of

the feature vectors segmentation can be performed

and by detecting regions in feature space classi-

fication can be performed. Most audio features

are extracted in three stages: (1) spectrum cal-

culation, (2) frequency-domain summarization,

(3) time-domain summarization. In spectrum

calculation, a short-time slice (typically around

10 to 40 milliseconds) of waveform samples is

transformed to a frequency domain representa-

tion. The most common such transformation is

the Short Time Fourier Transform (STFT). Dur-

ing each short-time slice the signal is assumed

to be approximately stationary and is windowed

to reduce the effect of discontinuities at the start

and end of the frame. This frequency domain

transformation preserves all the information in

the signal and therefore the resulting spectrum

has high dimensionality. For analysis purposes,

it is necessary to find a more succinct description

that has significantly lower dimensionality while

still retaining the desired content information.

Frequency domain summarization converts the

high dimensional spectrum (typically 512 or 1024

coefficients) to a smaller set of number features

(typically 10-30). A common approach is to use

Search WWH ::

Custom Search

Home