Information Technology Reference
In-Depth Information
send the results to a collector process (possibly
running on the same machine as the dispatcher)
that gathers the results. The audio collection was
partitioned into subcollections which were sent
to each worker. All the audio clips are stored on
the dispatcher. We found that the optimal number
of worker nodes in this model was three, after
which there was no time benefit of using extra
machines. In fact, it was costly to add any more
than five worker nodes due to the network capacity
of the dispatcher collector. The use of multiple
dispatchers in hierarchical fashion can improve
results. More details can be found in Bray and
Tzanetakis (2005b).
Table 1 shows results of using the collection
dispatcher model, using up to 5 worker nodes and
audio collections of up to 10,000 files. The format
is hours:minutes:seconds .
The problem with the collection dispatcher ap-
proach is that some nodes may complete process-
ing the features of their respective subcollection
before others and have to sit idle. Thus the time
it takes to process the entire collection is depen-
dent on the slowest node in the system. In order
to alleviate this problem and make sure of idle
nodes, an adaptive approach is used where the
dispatcher sends data as necessary to each worker.
That way, each node in the system is working
until the dispatcher has finished processing the
files in the collection. Table 2 shows the increase
in performance based on this approach.
Typically feature extraction tests run on audio
collections of around 10,000 files. Based on our
Table 2. Parellelization results for adaptive dis-
patcher
10
100
1000
10000
Local
00:05
00:58
09:39
1:36:49
1W
00:07
01:10
11:48
1:58:49
2W
00:04
00:40
06:21
1:02
3W
00:03
00:33
05:33
57:10
4W
00:03
00:31
05:24
54:20
5W
00:03
00:31
05:25
54:15
results, we expect a linear trend as collection size
increases. To test that hypothesis a large-scale test
using the data-partitioning model (2 dispatchers
with half the audio data each) with the adaptive
dispatcher was conducted on 100,000 files. As
expected, it took approximately 10 times the
amount of time to complete the 100,000 clip test
as it took to complete the 10,000 file test (5:00:44).
Experimental results show that using 5 comput-
ers we can perform audio feature extraction for
100,000 30-second clips in 5 hours.
conclusIons and future
Work
In this chapter we described MARSYAS a free
software framework for audio applications with
specific emphasis on MIR. We showed how
MARSYAS addresses some of the requirements and
challenges facing the designer of audio process-
ing frameworks. It is our hope that the ideas and
concepts presented in this chapter can be applied
to other MIR software frameworks and tools.
As a general conclusion dataflow architectures
can help express easily complicated processing
structures while retaining efficiency and being
easy to parallelize. Finally the development of
free software in an academic setting is not only
possible, but has many benefits such as increase
in visibility, collaboration opportunities, com-
munication and even monetary rewards.
MARSYAS is an ongoing project and there-
fore there are always many directions of future
Table 1. Parellelization results for collection
dispatcher
10
100
1000
10000
Local
00:05
00:58
09:39
1:36:49
1W
00:07
01:10
11:48
1:58:49
2W
00:03
00:38
06:01
1:10:46
3W
00:04
00:34
05:49
59:46
4W
00:03
00:34
05:52
1:04:56
5W
00:04
00:36
05:54
1:08:36
 
Search WWH ::




Custom Search