Information Technology Reference
In-Depth Information
Along with these, many secondary characteristics
may also be stored, such as the volume at which
the note should be performed, or performance
variations such as trilling or vibrato which are
dependent upon the specific instrument or voice
involved. Finally, information about the song as a
whole will be present, both musical (e.g., musical
key(s), intended tempo or tempos or other perfor-
mance notes) and metadata (e.g., title, composer
or year published). Printed musical scores are the
most familiar means of representing music in this
manner, but many other formats exist to fulfill
the same purpose. An excellent volume which
describes most of the relevant data formats was
edited by Selfridge-Field (1997).
One of the most popular of these high-level
representation formats is known as MIDI. MIDI
(Musical Instrument Digital Interface) (1993) was
designed initially as a protocol for networking and
control of electronic musical instruments, which
at first consisted mainly of synthesizer keyboards
but later included many other instruments and
devices. Within the protocol is defined a series
of messages which control timing and the play-
ing of notes via events. MIDI can support the
segmentation of music data into separate tracks,
which typically are used to represent the individual
instruments and voices needed for performance
of the song being stored. These tracks are often
given identifying labels to indicate the instru-
ment or voice that represents the data within the
track. The Standard MIDI Format (SMF) is the
complementary data format developed to store
MIDI information in individual files. In addition
to the track and note data described, the storage
format also accommodates additional information
including song lyrics, time and key signatures,
tempo(s), author and title. The MIDI standard
has been universally adopted by manufactur-
ers of synthesizer keyboards and other digital
instruments. Rothstein's topic (1995) describes
the standard in detail and gives a discussion of
its implementation in various products. Many
thousands of MIDI song files have been pub-
lished in the public domain. Several research
groups have assembled their own collections of
MIDI files to create test databases for their MIR
research systems, including those described by
Jang, Chen and Kao (2001a), Kosugi, Nishihara,
Sakata, Yamamuro, and Kushima (2000) and
Uitdenbogerd and Zobel (1998).
Conversion of a logical description of music
into an audible representation is quite straight-
forward: such a description needs only to be
interpreted in a performance, whether it is done
by live performers or a music synthesis system.
On the other hand, automated transcription, or
conversion of a digitized audio recording into a
logical data representation, is a much more difficult
problem. Klapuri is one of many researchers in
this field, and he maintains an excellent literature
review of this research area on his personal Web
site (Klapuri, n.d.).
With the prospect of generalized automated
music transcription likely many years away, an
approach that should prove more successful in
the short term is the introduction of new hybrid
audio data formats which enable the association
of logical descriptors along with representations
of the digital waveforms in a common file. Ex-
amples of ongoing efforts in this area include
MPEG-7 (Martinez, 2004). This type of format
will afford the content creator the opportunity
to include multiple representations of a piece of
music in the same file, which could provide access
to logical descriptors of the otherwise opaque
digitized audio stream.
automated transcrIptIon of
the human voIce
While efforts continue in expanding the capabili-
ties of music transcription in the coming years,
systems now exist which can accurately transcribe
music sources in which only one or very few
instruments or voices are heard at one time.
Search WWH ::




Custom Search