Information Technology Reference
In-Depth Information
most used dimension in music retrieval. Another
interesting comparison of approaches to music
retrieval has been presented in Hu and Dannen-
berg (2002), where the focus was on alternative
representations for a dynamic programming
approach, both from the retrieval effectiveness
and from the computational cost points of view.
In the presented study the computational costs of
the tested approaches were comparable, and thus
result are not reported.
The organization of a number of evaluation
campaigns by the research community working
on the different aspects of music access, retrieval,
and feature extraction (IMIRSEL, 2006), which
started in 2005 (preceded in 2004 by an evalu-
ation effort on audio analysis), will increasingly
allow for the comparison of different approaches
to music indexing, using standard collections
(Downie, Futrelle & Tcheng, 2004).
straightforward, and can be carried out in linear
time. The idea underlying this approach is that
the effect of musically irrelevant N-grams will be
compensated by the presence of all the musically
relevant ones. It is common practice to choose
small values for N, typically from 3 to 7 notes,
because short units give higher recall, which is
considered more significant than the subsequent
lowering in terms of precision. Fixed-length seg-
mentation can be extended to polyphonic scores,
with the aim to extract all relevant monophonic
tokens from concurrent multiple voices (Do-
raisamy & Rüger, 2004).
Data-Driven Segmentation (DD)
Segmentation can be performed considering
that typical passages of a given melody tend to
be repeated many times (Pienimäki, 2002). The
repetitions can simply be due to the presence of
different choruses in the score or can be related
to the use of the same melodic material along
the composition. Each sequence that is repeated
at least K times—normally twice—is usually
defined a pattern , and is used for the description
of a music document. This approach is called data-
driven because patterns are computed only from
the document data without exploiting knowledge
on music perception or structure. This approach
can be considered as an extension of the N-grams
approach, because DD units can be of any length,
with the limitation that they have to be repeated
inside the melody—subpatterns that are included
in longer patterns are discarded, if they have the
same multiplicity. Patterns can be computed from
different features, like pitch or rhythm, each fea-
ture giving a different set of DD units to describe
document content. Patterns can be truncated by
applying a given threshold, to reduce the size of
the index and to achieve a higher robustness to
local errors in the query (Neve & Orio, 2004). The
extension to polyphonic scores can be carried out
similarly to the FL approach.
approaches to melodic
segmentation
The approaches to music segmentation can be
roughly divided in two main groups: the ones that
highlight the lexical units using only the document
content, and the ones that exploit prior informa-
tion about the music theory and perception. Four
different approaches, two for each group, have
been tested.
Fixed-Length Segmentation (FL)
The simplest segmentation approach consists of
the extraction from a melody of subsequences
of exactly N notes, called N-grams (Downie &
Nelson, 2000). N-grams may overlap, because no
assumption is made on the possible starting point
of a theme, neither on the possible repetitions of
relevant music passages. The strength of this ap-
proach is its simplicity, because it is based neither
on assumption on theories on music composition
or perception, nor on analysis of complete melo-
dies. The exhaustive computation of FL units is
Search WWH ::




Custom Search