MARSYAS-0.2: A Case Study in Implementing Music Information Retrieval Systems - Intelligent Music Information Systems: Tools and Methodologies

Information Technology Reference

In-Depth Information

send the results to a collector process (possibly

running on the same machine as the dispatcher)

that gathers the results. The audio collection was

partitioned into subcollections which were sent

to each worker. All the audio clips are stored on

the dispatcher. We found that the optimal number

of worker nodes in this model was three, after

which there was no time benefit of using extra

machines. In fact, it was costly to add any more

than five worker nodes due to the network capacity

of the dispatcher collector. The use of multiple

dispatchers in hierarchical fashion can improve

results. More details can be found in Bray and

Tzanetakis (2005b).

Table 1 shows results of using the collection

dispatcher model, using up to 5 worker nodes and

audio collections of up to 10,000 files. The format

is hours:minutes:seconds .

The problem with the collection dispatcher ap-

proach is that some nodes may complete process-

ing the features of their respective subcollection

before others and have to sit idle. Thus the time

it takes to process the entire collection is depen-

dent on the slowest node in the system. In order

to alleviate this problem and make sure of idle

nodes, an adaptive approach is used where the

dispatcher sends data as necessary to each worker.

That way, each node in the system is working

until the dispatcher has finished processing the

files in the collection. Table 2 shows the increase

in performance based on this approach.

Typically feature extraction tests run on audio

collections of around 10,000 files. Based on our

Table 2. Parellelization results for adaptive dis-

patcher

100

1000

10000

Local

00:05

00:58

09:39

1:36:49

00:07

01:10

11:48

1:58:49

00:04

00:40

06:21

1:02

00:03

00:33

05:33

57:10

00:03

00:31

05:24

54:20

00:03

00:31

05:25

54:15

results, we expect a linear trend as collection size

increases. To test that hypothesis a large-scale test

using the data-partitioning model (2 dispatchers

with half the audio data each) with the adaptive

dispatcher was conducted on 100,000 files. As

expected, it took approximately 10 times the

amount of time to complete the 100,000 clip test

as it took to complete the 10,000 file test (5:00:44).

Experimental results show that using 5 comput-

ers we can perform audio feature extraction for

100,000 30-second clips in 5 hours.

conclusIons and future

Work

In this chapter we described MARSYAS a free

software framework for audio applications with

specific emphasis on MIR. We showed how

MARSYAS addresses some of the requirements and

challenges facing the designer of audio process-

ing frameworks. It is our hope that the ideas and

concepts presented in this chapter can be applied

to other MIR software frameworks and tools.

As a general conclusion dataflow architectures

can help express easily complicated processing

structures while retaining efficiency and being

easy to parallelize. Finally the development of

free software in an academic setting is not only

possible, but has many benefits such as increase

in visibility, collaboration opportunities, com-

munication and even monetary rewards.

MARSYAS is an ongoing project and there-

fore there are always many directions of future

Table 1. Parellelization results for collection

dispatcher

100

1000

10000

Local

00:05

00:58

09:39

1:36:49

00:07

01:10

11:48

1:58:49

00:03

00:38

06:01

1:10:46

00:04

00:34

05:49

59:46

00:03

00:34

05:52

1:04:56

00:04

00:36

05:54

1:08:36

Intelligent Music Information Systems: Tools and Methodologies

Search WWH ::

Custom Search

Home