Information Technology Reference
In-Depth Information
This requires a syllable by syllable time-stamped
lyric file to be associated with the audio file. Since
such files must be created manually through a
time-consuming process, this limits the wider
applicability of P-Karaoke.
In addition to P-Karaoke, Microsoft Research
Asia developed Photo2Video (Hua, Lu, et al.
2003). Starting with the user selecting a set of
photographs or a photo-graphic series, Photo-
2Video identifies a set of key-frame sequences.
With the key frames, several motion trajectories
are generated in a Ken Burn's documentary
style. Here the camera pans across several salient
elements in the photo as determined through
face recognition or attention maps. The motion
sequences are then aligned with a user specified
song, using event onsets detected in the frequency
domain of the song as starting candidate points
for the sequences.
Existing commercial music video creators,
such as Muvee's AutoProducer and Microsoft's
PhotoStory 3 require significant user interaction to
create the video. Typically, the user must specify
the song's speed and hand select the desired local
photos. These systems provide assisted music
video creation; they construct photo narratives
with an audio soundtrack.
MusicStory produces videos autonomously,
building a video from person photo collections
when possible and public collections when needed.
MusicStory discovers images (locally or online)
linked to the words in the lyrics. The end result is
a video which brings new and unexpected imagery
to the viewer, based on images, textually indexed,
and related to the song itself. Since MusicStory's
images are semantically tied (via Web-indexing
and community tagging) to the song and its lyrics,
it brings a new experience, a musical narrative
with discovered imagery that requires no work
on the users part. The following section describes
MusicStory in depth.
musicstory
Like a human listener, MusicStory processes the
lyrics in the music and these lyrics bring forth
associations with images. The imagery chosen by
MusicStory is defined by the set of links between
words drawn from the lyrics and image-word as-
sociations contained in a social network, either
private or public. As the images are found, it pres-
ents them to the audience, creating an on-the fly
music video, heightening, clarifying, and exposing
the connections between words, ideas, and images
that we are often unaware of, until shown. Figure
1 shows a slide-by-slide expansion of the imagery
from the lyrics of a Radiohead song.
The image associations that MusicStory
presents amplify the emotional experience and
heighten its visceral appeal by externalizing the
concrete imagery intrinsic in the lyrics. Some
images depict the expected relation found in the
song, while others present a juxtaposition between
the song's meaning and the meaning found within
the social network. Our approach focuses on the
creation of a photo narrative, to compliment the
music and not the strict alignment of images to lyr-
ics, as we have shown in the Imagination Environ-
ment. Artistically speaking, the strict alignment
of images-word pairs to lyrics or spoken dialog
provides an amplification of meaning through
free association (Shamma, 2005). For MusicStory,
we rely on this amplification of meaning in the
context of the song itself, and not the individual
words being communicated.
MusicStory uses public media to retrieve
images with popular relevance, relying on Web
frequency as a measure of familiarity and salience
(Shamma, Owsley, Bradshaw, & Hammond,
2004), returning images that reflect current pop-
culture meanings. More personal images can be
found by focusing the retrieval to smaller social
networks, such as personal photo-sharing sites.
Search WWH ::




Custom Search