MARSYAS-0.2: A Case Study in Implementing Music Information Retrieval Systems - Intelligent Music Information Systems: Tools and Methodologies

Information Technology Reference

In-Depth Information

Specific Topics

memory management and patching as all con-

nections are treated the same way. However,

some applications like audio feature extraction

require a variety of different buffer sizes to flow

through the network (for example feature vectors

typically have much lower dimensionality than

audio data). Even though it is possible to have

dynamic buffer sizes in Explicit Patching it is

complex to implement and frequently requires a

lot of work from the programmer to appropriately

set the connections. In addition, these fixed-sized

buffers are reused for holding spectral data and it

is up to the programmer to correctly connect the

spectral data to objects that process such data.

The result is that the exact details of the Short-

Time Fourier Transform are encapsulated as a

black box and the programmer has little control

over the process. Our proposed solution to these

two problems is to extend the semantics of the

data that is processed. In MARSYAS, processing

objects ( MarSystems ) operate on chunks of data

called Slices . Slices are matrices of floating point

numbers characterized by three parameters:

number of samples (things that are “measured”

at different instances in time), number of obser-

vations (things that are “measured” at the same

time instance) and sampling rate. This approach

is similar to the Sound Description Interchange

Format (SDIF) (Schwarz & Wright, 1997).

Figure 4 shows a MarSystem for spectral pro-

cessing that converts an incoming audio buffer of

512 samples of 1 observation at a sampling rate of

22050 Hz to 1 sample of 512 observations (the FFT

bins) at a lower sampling rate of 22050/512 Hz. By

propagating information about the sampling rate

and number of observations through the dataflow

network, the use of Slices provides more correct

and flexible semantics for spectral processing and

feature extraction. MarSystems are designed so

that they can handle Slices with arbitrary dimen-

sions with one important constraint: they need to

be able to calculate their output Slice parameters

from their input Slice parameters. For example it

is possible to change the input number of samples

In this section we discuss in more detail some

specific topics that we believe are particularly

interesting to the designer of audio processing

frameworks.

Implicit Patching

The basic idea behind Implicit Patching (Bray &

Tzanetakis, 2005) is to use object composition

rather than explicitly specifying connections be-

tween input and output ports in order to construct

the dataflow network. For example the following

pseudo-code example (Figure 3) illustrates the

difference between Explicit and Implicit Patching

in a simple playback network.

The idea of Implicit Patching evolved from

the integration if three different ideas that were

developed independently in previous versions

of MARSYAS . These three ideas and how they

are integrated are described below. In addition,

examples illustrating the expressive power of

Implicit Patching are presented.

The first idea originated from the desire not

to be constrained to fixed buffer sizes and to

have proper semantics for spectral data. The

majority of existing audio processing environ-

ments requires that all processing objects in a

flow network/visual patch, process fixed buffers

of audio samples (typical numbers are 64 and

128 samples). Having fixed buffers simplifies

Figure 3. Explicit and implicit patching

#EXPLICTPATCHING

createsource,gain,dest

#connecttheappropriatein/outports

connect(source.out1,gain.in1);

connect(gain.out1,dest.in1);

#IMPLICITPATCHING

createsource,gain,dest

#createacompositethatisthenetwork

createseries(source,gain,dest)

Intelligent Music Information Systems: Tools and Methodologies

Search WWH ::

Custom Search

Home