Content-Based Indexing of Symbolic Music Documents - Intelligent Music Information Systems: Tools and Methodologies

Information Technology Reference

In-Depth Information

Figure 3. Parallel processing of documents and

queries aimed at retrieval of potentially relevant

documents

Retrieval can be then carried out by measuring

the similarity (or the distance) between the two

strings, ranking the retrieved documents accord-

ing to their decreasing similarity (or decreasing

distance) from the query. The complexity of these

techniques is linear with the size of the document

collection, because all the documents have to be

matched against the query.

Indexing techniques does not require this

exhaustive comparison, and in fact the main

motivation behind indexing is its efficiency and

scalability also for very large collections of docu-

ments. Let us consider each index term as a pointer

to the list of the documents that contain it. It is

assumed that the number of documents in each list

is small if compared to the number of documents

in the collection, apart from stop-words that are

usually not used as index terms. This assumption

is surely true for textual documents, but it applies

also to music documents because melodies have

different thematic material. Index terms can

then be stored in efficient data structures, such

as hash tables that can be accessed in constant

time or binary search trees that can be accessed

in logarithmic time.

The efficiency implied by indexing is somehow

balanced by the retrieval effectiveness. The main

issue is that, in order to be efficient, the access

to the data structure requires an exact match be-

tween documents and query indexes. While this

assumption is reasonable for textual documents,

because the user is expected to spell correctly the

words of the query, in the music domains there

are many sources of mismatch that may affect

retrieval effectiveness. A melodic query can ei-

ther contain errors, due to imprecise recall of the

melody, or be a different variant of a particular

theme. These differences may affect the way in-

dex terms are computed from the query and the

way they are represented. For this reason, some

peculiar aspects of music document indexing are

addressed in more detail.

involve the application of noise reduction and pitch

tracking techniques (de Cheveigné & Baskind,

2003), in the case of audio queries, followed by

an approach to segmentation that can be carried

out with the same technique used for document

segmentation or with different techniques tailored

to the peculiarities of the queries. Figure 3 rep-

resents the parallel processing of documents and

queries aimed at computing the RSV for ranking

relevant documents.

Efficiency and Effectiveness

It can be argued that all these steps, although useful

for indexing textual documents, are not neces-

sary for a music retrieval task that can be solved

directly within an approximate string matching

framework, as mentioned in the introduction of

this chapter. For instance, the main melodies of

music documents can be represented by arrays of

symbols, where the number of different symbols

depends on the kind of quantization applied to

melodic and rhythmic information. The user's

query is normally an excerpt of a complete melody

and thus can undergo the same representation.

Intelligent Music Information Systems: Tools and Methodologies

Search WWH ::

Custom Search

Home