Heijo—A Video Database System for Retrieving Semantically Coherent Video Information - Nontraditional Database Systems

Databases Reference

In-Depth Information

from the time point of its completion). Here, when we search for “a picture in

which a person identified as Mr. X is reflected,” Mr. X certainly appears over the

picture, but there is no guarantee that Mr. X is actually playing a central role.

Perhaps Mr. X is merely being reflected, while others are talking. In the same way,

when we search for “a picture in which Mr. X is talking,” a scene is certainly

found where Mr. X is talking, but perhaps as a video, the result might be a picture

with no relationship to Mr. X.

From the viewpoint of video processing or audio processing, the above may be

sufficient because image or speech is involved. However, when we consider

extracting coherent information from a video database, what we have is insufficient.

In Section 3, we propose a “joint” operation for the purpose of extracting

semantically coherent video objects, where “minced” simple indexes are used as

clues.

Next, suppose simple indexing is performed for each of two independent video

information, and consider how to search for video objects spread over these two

groups. The example includes video of a satellite conferencing system. With this

system, motion pictures and sounds are sent from two different base stations, and

a conference is held in which base stations at various locations participate. The

conference participants view a video 1 and a video 2 at the same time. In the

figure, both video 1 and video 2 show the participants at two different locations

conversing with each other.

If each video is recorded on separate media, for example, two video tapes, and

digitized, both conference locations become logically separated, and the two

participants who are talking appear completely separated in the other video.

Assuming simple indexing on each, what method is adequate to grasp these two

people actually talking? This is another example of extracting semantically coherent

video objects (with simple indexes as clues). In Section 4, by using this example,

we show a solution and demonstrate that the “joint” operation effective for this

purpose too.

3 Operations on heterogeneous media data

3.1 Retrieving video objects

We discuss here how to utilize indices separately constructed on heterogeneous

media data, such as motion pictures and audio, in order to retrieve meaningful

video fragments. We call them “video objects.” In the following discussions, we

assume that simple indices on individual media, for example, indices on motion

pictures representing which characters are on the screen and indices on audio

representing which characters are talking, are given in advance.

Generally speaking, we cannot realize the retrieval of video objects as long as

we individually use indices on each medium. Let us consider an example shown in

Fig. 1. The figure depicts a video data indexed with time intervals. The x-axis

Search WWH ::

Custom Search

Home