Content-Based Indexing of Symbolic Music Documents - Intelligent Music Information Systems: Tools and Methodologies

Information Technology Reference

In-Depth Information

show that all documents contain words such as

“music”, “computer”, “note”, “algorithm”, and so

on. Moreover, these words are probably evenly

distributed across the collection, and their con-

tribution to specify the content of a particular

document in respect to the others is very low. Also

in this case, a collection dependent stop-list can

be created, and words belonging to the stop-list

can be ignored in subsequent phases of document

indexing. The stop-list can be computed automati-

cally by analyzing a representative sample of the

collection, adding to the stop-list all the words that

consistently appear in all (or in a high percentage)

of the analyzed documents. Clearly, this kind

of analysis would highlight also the words that

have a grammatical function and no semantic as

described above, thus a two-step removal of stop-

words can be avoided. Nevertheless, the designer

of an IR system can choose to remove only the

frequent and uninformative words, keeping the

ones that are only frequent.

The choice of the particular stop-list to use, if

any, could be driven by both musicological and

computational motivations and by the character-

istics of the music collection itself. A statistical

analysis of the distribution of lexical units across

documents may highlight which are the potential

stop-words that can be used. It has to be noted

that this approach is not usually exploited in the

literature of music indexing and retrieval. The term

“stop-list” is quite infrequent in music retrieval,

and the common approach is to select carefully

the parameters to avoid the computation of lexical

units that are believed to be uninformative about

the document content. What it is important for

this discussion is to highlight the fact that not all

the lexical units are equally informative about the

document content and its differences with other

documents in the collection (which is aim of term

weighting described below) and that some lexical

units may be totally uninformative as a sort of

background noise.

Application to the Music Domain

stemming

It is difficult to state whether or not a musical

lexical unit has a meaning in order to create a

priori a stop-list of musical lexical units that can

be ignored during indexing. It is preferable to face

the problem considering how much a particular

unit is a good discriminator between different

music documents. For instance, in the case of

indexing of melodic intervals, a lexical unit of

two notes that form a major second is likely to

be present in almost all of the documents, and

thus not being a good index in the case of a col-

lection of “cantate” of tonal Western music, and

probably for any collection of music documents.

A single major chord is unlikely to be a good dis-

criminator as well. Depending on the particular

set of features used to index a music collection,

the designer of the indexing and retrieval engine

can make a number of choices about the possible

stop-list of lexical units.

Many words, though different in the way they are

spelled, can be considered as different variants that

stem from a common morphological root. This is

the case of the English words “music”, “musical”

(adjective and substantive), “musicology”, “mu-

sician”; the number of variants may increase if

singular and plural forms are taken into account,

together with the gender information (which does

not apply to English but applies to most European

languages) and other possible variants which are

peculiar of some languages. Moreover, in many

languages verbs are conjugated, that is the root

of the verb is varied depending on mode, person

and time. Thus a textual document may contain

different word variants, which are identified as

different from lexical analysis but share a similar

meaning. Intuitively, it can be considered that a

textual document could be relevant for a given

information need even if it does not contain the

Intelligent Music Information Systems: Tools and Methodologies

Search WWH ::

Custom Search

Home