Information Technology Reference
In-Depth Information
For example, the word “home” from the song
Sweet Home Alabama , retrieves different as-
sociations from different repositories. Google's
ranking returns a canonical photo of a home from
a realtor's Web site. Flickr, in contrast, returns
photos people took in their home, in this case of
their child. The combination of image repositories
(popular, canonical, and personal) provide a bal-
ance of associations which the agent uses to aid
in the art's creation and assists in the audience's
understanding of the work (Kandinsky, 1994;
Shamma, 2005).
While it is possible to use simple search to
create a sequence of images overlaying music,
that does not make for a successful piece. Suc-
cessful integration of sound and image relies on
an intimate knowledge of the media itself, con-
sidering available image resources (repositories,
number of images per term), musical parameters
(tempo, dynamics, density, lyrics) and the output
format (screen size, playback bit rate). To do this,
MusicStory assumes the role of a director, con-
cerning itself with the overall flow and pacing of
the resulting multimedia performance. In order
to build a music video, MusicStory must identify
three features of the song. First, it must find the
lyrics to guide the narrative. It must identify the
overall pace of the song. Finally, it must identify
points of significant structural change in the
music. Currently MusicStory searches for the
points where a lead instrument or vocalist starts
and stops in a performance.
MusicStory's director is an Artistic Informa-
tion Agent. This agent is a variant of the Informa-
tion Management Assistant (Budzik, 2003) used
in Information Retrieval. The agent's architecture
shows several adapters that let it access online
information sources. The core functionality is
separated into four basic components . The Artistic
Analyzers for the MusicStory consist of a listener
and a presenter. The listener feeds in the audio
information from a source and the metadata (some
metadata, such as lyrics, is not carried within the
source file). The presenter controls how the final
movie is created. Table 1 outlines MusicStory's
general workflow.
fetch metadata
To make music videos, MusicStory borrows from
the Imagination Environment's information flow
(Shamma, Owlsey, Hammond, & Bradshaw,
2004). Starting with an audio file (MP3, wav, wma,
etc.), the metadata must be extracted. MusicStory
uses populated metadata to identify song, album,
and artist name. If the audio file is not populated
with metadata, MusicStory queries the user for the
missing information. Alternatively, a music audio
fingerprinting service, such as Shazam (Wang,
2003) may be used by the to identify the song.
finding lyrics
Speech-to-text on singing is a well-known un-
solved problem (Mellody, Bartsch, & Wakefield,
2003), due to the nonstandard nature of the speech
and the large amount of background noise (read:
the musical accompaniment) present in the record-
ing. In fact, many humans have great difficulty in
performing this task (see http://www.kissthisguy.
com/ for examples of misheard lyrics). For this
reason, we concentrated on finding song lyrics in
online lyric repositories. There are many strate-
gies for finding song lyrics from audio metadata
like artist, title, and album information. Using a
general-purpose search engine to find lyrics intro-
duces difficulties. As an alternative, MusicStory
uses Leos Lyrics, an online lyrics library whose
specialized search engines allow a direct lookup
from the metadata.
Table 1. MusicStory's steps in music video creation
1.
Fetch Metadata
2.
Lyrics Search
3.
Image Search
4.
Pace/Beat Match
5.
Find Vocal Segments
6.
Determine Slide Transition and Duration Times
7.
Make Slideshow
Search WWH ::




Custom Search