Information Technology Reference
In-Depth Information
66,000 audio features from the space of features
which could be constructed using Y ale . Then,
they evaluate the contribution of each feature to
the classification task at hand. This tremendous
effort is rewarded by the best performance of this
tailored feature set when compared to fixed fea-
ture sets. It is interesting that the general-purpose
feature set of Tzanetakis ranks in the middle of
the feature sets, where a feature set adapted to a
different classification task performs the worst.
General-purpose features cannot achieve the best
performance in all tasks. Tailoring to one classi-
fication task also means to fail in other, different
ones. Most likely, their hand-tailored feature set
would fail in another classification task. Without
noticing, the authors provide another justifica-
tion for the need of automatic adaptive feature
construction.
To the best of our knowledge, in addition to
Y ale there exists only one approach, the EDS, to
automatic feature construction (Zils & Pachet,
2004). Developed at the same time, the two sys-
tems are similar in the tree structure of feature
extractions, its XML representation format, and
the choice of genetic programming for the con-
struction. However, the fitness function of EDS
and Y ale differ. EDS employs a heuristic score
which evaluates the composition of operators
(building blocks) independent of the classification
task for which the features are to be extracted.
In contrast, we use the performance of classifica-
tion learning as the fitness function for evolving
feature construction. Therefore, the Y ale feature
construction adapts rigorously to the classification
task at hand. Also the method of growing the tree
of feature extractions differs. Their procedure
is similar to that of a grammar producing well-
formed sentences. EDS instantiates a set of general
patterns. We shall describe the Y ale procedure
in detail below (Section 4).
The choice of the method for classifier learning
seems to have less impact on the result than has
the chosen feature set. Primarily we use the sup-
port vector machine (SVM) as did, for example,
Maddage, Xu, and Wang (2003), Mandel and Ellis
(2005), and Meng and Shawe-Taylor (2005).
Recently, some authors embed music infor-
mation retrieval into a peer-to-peer (p2p) setting
(Wang, Li, & Shi, 2002). Despite the run-time
advantages, we see the support of user collabora-
tion as the crucial advantage and have developed
the Nemoz system as a p2p system. Tzanetakis
and colleagues have pushed forward this view
(Tzanetakis, Gao, & Steenkiste, 2003):
“One of the greatest potential benefits of p2p
networks is the ability to harness the collaborative
efforts of users to provide semantic, subjective
and community-based tags to describe musical
content.”
a unIfYIng frameWork for
feature extractIon
Audio data are time series, where the y -axis is the
current amplitude corresponding to a loudspeak-
er's membrane and the x -axis corresponds to the
time. They are univariate, finite, and equidistant.
We may generalize the type of series which we
want to investigate to value series . Each element
x i of the series consists of two components. The
first is the index component , which indicates a
position on a straight line (e.g., time). The second
component is a m -dimensional vector of values
which is an element of the value space .
Definition 1 A value series is a mapping x :
× m where we write x n instead of x ( n ) and
( x i ) i {1 ,…,n } for a series of length n.
This general definition covers time series as
well as their transformations. All the methods
described in the following refer to value series.
They are of course not only applicable to audio
data, but to value series in general. The usage of
a complex number value space instead of a real
Search WWH ::




Custom Search