Information Technology Reference
In-Depth Information
exact term chosen by the user for his query, but it
contains other variants. For example, a document
that contains many times the words “computing”,
“compute”, and “computation” is likely to address
the subject of “computers” even if this exact word
is missing. On the other hand, many words, even
though stemming from the same root, evolved to
express different meanings.
The basic idea of stemming is to conflate into
a single index all the words that have slightly
different meaning but stem from a common
morphological root. The positive effects are a
generalization of the concepts that are carried by
the stems and not by the single words, and a lower
number of index terms. The higher generaliza-
tion is expected to improve recall, at a probable
cost of lowering precision. There are different
approaches to automatic stemming, depending
on the morphology of the languages, and on the
used techniques. The research on stemming is still
very active, in particular for languages different
from English and that have a rich morphological
structure, with derivations expressed by prefixes,
infixes and suffixes.
It can be noted that the analogous of stem-
ming is regularly carried out in many approaches
to music retrieval, and it is normally addressed
as feature quantization . The main motivation
of feature quantization in music processing is
probably related to the fact that each feature
extraction process is error prone: quantization
partially overcomes this problem if erroneous
measurements are reported to the same quantized
value of the correct one. For example, because
pitch detectors are known to produce octave er-
rors, a solution that has been often proposed in
the literature is to represent only the name of the
notes, with eventual alterations, and not their ac-
tual octave (Birmingham et al., 2001). Automatic
chord detection from polyphonic audio signals is
still very error prone, thus quantization to a fixed
number of chords—for example triads only—may
help removing part of the measurement noise. Yet
quantization can be useful when the automatic
detection is reliable, but it is known in advance
that the signal itself may have variations, like in
the case of onset notes and note durations even
for performances of the same score.
Quantization can be useful also as a stemming
procedure. It is well known that many composi-
tions are based on a limited number of music
materials, which is presented and then varied
and developed during the piece. In this case, the
conflation of different thematic variations into a
single index will improve the recall because the
user may choose any of these variations to express
the same information need. Quantization can be
carried out on any music dimension, and at dif-
ferent levels. Table 1 shows possible approaches
to the quantization of melodic intervals, some of
them already proposed in the literature, from the
more fine-grained to the more-coarse. Figure 2
gives a graphical representation on the amount
of information that is lost through quantization,
in particular when melodic or rhythmic infor-
mation is quantized in a single level and thus
discarded.
Application to the Music Domain
The idea behind stemming is that two indexes
may be different but can be perceived/consid-
ered similar. Analogously, two musical lexical
units may be slightly different, yet listeners can
perceive them as almost identical, or confuse one
from the other when recalling from memory, or
consider that they play a similar role in the musi-
cal structure. For instance, two identical rhythmic
patterns played with a different tempo and small
variations in the actual onset time, two musical
phrases that differ only for one interval that from
major turns minor, chords progressions where
one chord is substituted by another with a similar
function as it is routinely done in jazz music, are
all situations where stemming may become useful.
In practice, all the perceptually similar variants
could be conflated into a common stem.
Search WWH ::




Custom Search