MusicStory: An Autonomous, Personalized Music Video Creator - Intelligent Music Information Systems: Tools and Methodologies

Information Technology Reference

In-Depth Information

For example, the word “home” from the song

Sweet Home Alabama , retrieves different as-

sociations from different repositories. Google's

ranking returns a canonical photo of a home from

a realtor's Web site. Flickr, in contrast, returns

photos people took in their home, in this case of

their child. The combination of image repositories

(popular, canonical, and personal) provide a bal-

ance of associations which the agent uses to aid

in the art's creation and assists in the audience's

understanding of the work (Kandinsky, 1994;

Shamma, 2005).

While it is possible to use simple search to

create a sequence of images overlaying music,

that does not make for a successful piece. Suc-

cessful integration of sound and image relies on

an intimate knowledge of the media itself, con-

sidering available image resources (repositories,

number of images per term), musical parameters

(tempo, dynamics, density, lyrics) and the output

format (screen size, playback bit rate). To do this,

MusicStory assumes the role of a director, con-

cerning itself with the overall flow and pacing of

the resulting multimedia performance. In order

to build a music video, MusicStory must identify

three features of the song. First, it must find the

lyrics to guide the narrative. It must identify the

overall pace of the song. Finally, it must identify

points of significant structural change in the

music. Currently MusicStory searches for the

points where a lead instrument or vocalist starts

and stops in a performance.

MusicStory's director is an Artistic Informa-

tion Agent. This agent is a variant of the Informa-

tion Management Assistant (Budzik, 2003) used

in Information Retrieval. The agent's architecture

shows several adapters that let it access online

information sources. The core functionality is

separated into four basic components . The Artistic

Analyzers for the MusicStory consist of a listener

and a presenter. The listener feeds in the audio

information from a source and the metadata (some

metadata, such as lyrics, is not carried within the

source file). The presenter controls how the final

movie is created. Table 1 outlines MusicStory's

general workflow.

fetch metadata

To make music videos, MusicStory borrows from

the Imagination Environment's information flow

(Shamma, Owlsey, Hammond, & Bradshaw,

2004). Starting with an audio file (MP3, wav, wma,

etc.), the metadata must be extracted. MusicStory

uses populated metadata to identify song, album,

and artist name. If the audio file is not populated

with metadata, MusicStory queries the user for the

missing information. Alternatively, a music audio

fingerprinting service, such as Shazam (Wang,

2003) may be used by the to identify the song.

finding lyrics

Speech-to-text on singing is a well-known un-

solved problem (Mellody, Bartsch, & Wakefield,

2003), due to the nonstandard nature of the speech

and the large amount of background noise (read:

the musical accompaniment) present in the record-

ing. In fact, many humans have great difficulty in

performing this task (see http://www.kissthisguy.

com/ for examples of misheard lyrics). For this

reason, we concentrated on finding song lyrics in

online lyric repositories. There are many strate-

gies for finding song lyrics from audio metadata

like artist, title, and album information. Using a

general-purpose search engine to find lyrics intro-

duces difficulties. As an alternative, MusicStory

uses Leos Lyrics, an online lyrics library whose

specialized search engines allow a direct lookup

from the metadata.

Table 1. MusicStory's steps in music video creation

Fetch Metadata

Lyrics Search

Image Search

Pace/Beat Match

Find Vocal Segments

Determine Slide Transition and Duration Times

Make Slideshow

Intelligent Music Information Systems: Tools and Methodologies

Search WWH ::

Custom Search

Home