Content-Based Indexing of Symbolic Music Documents - Intelligent Music Information Systems: Tools and Methodologies

Information Technology Reference

In-Depth Information

dexing because of the evident difference between

textual and musical communication. One thing

that is worth mentioning is that users access the

two medias very differently. In particular, music

documents are accessed many times by users,

who may choose to not listen to the complete

song, but only to a part of the song. Moreover, it

is common practice of radio stations to broadcast

only the parts of the songs with the sung melody,

skipping the intro and the coda, and fading out

during long guitar solos. The computation of

the relative importance by which a lexical unit

describes a document should deal also with these

aspects. Moreover, listeners are likely to remember

and use in their queries the part of the song where

the title is sung, which becomes more relevant

disregarding its frequency inside the documents

and inside the collection. Yet, there have been very

few studies that investigate the best weighting

scheme for music indexing, and in many cases

a direct implementation of the tf  idf (such as the

one presented in this section) is used.

It is important to note that the possibility to give

different weights to lexical units is an important

difference between information retrieval and ap-

proaches based on recognition—such as approxi-

mate string matching techniques. The former allows

users to rank the documents depending on the

relevance of their lexical units as content descrip-

tors, while the latter allows for document ranking

depending on the degree at which an excerpt of

each document matches the query. In other words,

a good match with an almost irrelevant excerpt

may give a higher rank than a more approximate

match with a highly relevant excerpt. It could be

advisable to extend weighting approaches also to

methods other than indexing. To this end, a mixed

approach of indexing with approximate matching

has been proposed in Basaldella and Orio (2006),

where each index term was represented by a sta-

tistical model and the final weight of each index

term of the query was computed combining the

tf  idf scheme with the probability by which it was

generated by the model.

retrieval techniques

Once indexes have been built through the four

steps described earlier, and both the collection of

documents and the user query have been indexed,

it is possible to perform retrieval. It is important

to note that also the query has to be analyzed and

indexed in order to retrieve relevant documents,

because the similarity between the query and the

documents is carried out using indexes only.

Different approaches can be applied to retriev-

al; the one that is more intuitive, and that has been

extensively applied in the experiments reported in

the following sections, is the Vector-Space Model

(VSM). Accordingly to the VSM, both documents

and queries are represented as K -variate vectors of

descriptor weights w t,d , provided that K is the total

number of unique descriptors or indexes. Then,

document d i is represented as d i = ( w if ,…, w iK ),

while query q is represented as q = ( q 1 ,…,q K ). The

weight w t,d of index term t within document d are

computed according to the tf  idf scheme already

described. Query descriptor weights are usually

binary values, then q t = 1 if term t occurs within

query q , 0 otherwise.

The retrieval status value (RSV) is the cosine

of the angle between the query vector and the

document vector. That is:



⋅

RSV

(

)

cos(

)



⋅

where d and q are the document and the query

respectively, with their vectorial representations,

and | x | is the norm of vector x . As the cosine

function normalizes the RSV to the query and

document lengths, long documents have the same

chance of being retrieved than short ones.

In order to be comparable, both documents

and queries need to be transformed. This pro-

cess usually corresponds to the segmentation of

music documents in their lexical units, and to a

more complex query processing. The latter can

Intelligent Music Information Systems: Tools and Methodologies

Search WWH ::

Custom Search

Home