Information Technology Reference
In-Depth Information
Figure 5. (left) The mean log-frequency power coefficients for 80 instrumental and 75 vocal audio seg-
ments. (right) The mean difference in log-frequency power coefficients between vocal and instrumental
audio segments (LFPC vocal - LFPC inst ).
Figure 6. A wave form with annotated vocal and
nonvocal segments
using transitions
MusicStory creates a photo slide show of the
lyrics set to the source music. The images tell a
different narrative during the vocal segments. The
lyrics tell a story. To preserve the story, during
these segments, images are shown in the order of
the lyrics. During the nonvocal segments, a more
general thematic slideshow is shown, using lyric
terms with high frequency, ordered by frequency,
see Figure 7. An introduction segment is added
with images from the search for the band's name
and the song's title.
To make the actual transitions, MusicStory
uses the hint from our RMS pace estimate. From
previous work, we find that images need to be vis-
ible for at least 900 ms for the viewer to be able to
see and gather the image association (Shamma et
al, 2004). The agent adjusts each photo's duration
and the dissolve transition speed between photos
using the RMS pace hint. The initial version of
MusicStory uses the hint to select one of three
categories: slow, medium, and fast. Table 3 shows
the average display time per frame and number
of transition frames for each category. Each cat-
egory has a preset slide duration and transition
speed. The duration and transition times for each
category follow common video direction practice
(Groening & Cohen, 2002). The output video file
contains the audio in the same or similar format
as the source audio. The video can be encoded
with any variable bit-rate suitable for JPEG based
videos, we chose to use Microsoft's ImageVideo
Codec v9, which is designed for this style of ap-
plication.
targeting demographics
By looking at onset in the amplitude and frequency
domains, we can determine the overall pace of
a song and some vocal segmentation informa-
tion. Our results are positive albeit fragile with
respect towards our demographic. We targeted
MusicStory towards a 25-35 year old, iPodded
techno-savvy group, which implies a certain genre
Search WWH ::




Custom Search