Databases Reference
In-Depth Information
from the time point of its completion). Here, when we search for “a picture in
which a person identified as Mr. X is reflected,” Mr. X certainly appears over the
picture, but there is no guarantee that Mr. X is actually playing a central role.
Perhaps Mr. X is merely being reflected, while others are talking. In the same way,
when we search for “a picture in which Mr. X is talking,” a scene is certainly
found where Mr. X is talking, but perhaps as a video, the result might be a picture
with no relationship to Mr. X.
From the viewpoint of video processing or audio processing, the above may be
sufficient because image or speech is involved. However, when we consider
extracting coherent information from a video database, what we have is insufficient.
In Section 3, we propose a “joint” operation for the purpose of extracting
semantically coherent video objects, where “minced” simple indexes are used as
clues.
Next, suppose simple indexing is performed for each of two independent video
information, and consider how to search for video objects spread over these two
groups. The example includes video of a satellite conferencing system. With this
system, motion pictures and sounds are sent from two different base stations, and
a conference is held in which base stations at various locations participate. The
conference participants view a video 1 and a video 2 at the same time. In the
figure, both video 1 and video 2 show the participants at two different locations
conversing with each other.
If each video is recorded on separate media, for example, two video tapes, and
digitized, both conference locations become logically separated, and the two
participants who are talking appear completely separated in the other video.
Assuming simple indexing on each, what method is adequate to grasp these two
people actually talking? This is another example of extracting semantically coherent
video objects (with simple indexes as clues). In Section 4, by using this example,
we show a solution and demonstrate that the “joint” operation effective for this
purpose too.
3 Operations on heterogeneous media data
3.1 Retrieving video objects
We discuss here how to utilize indices separately constructed on heterogeneous
media data, such as motion pictures and audio, in order to retrieve meaningful
video fragments. We call them “video objects.” In the following discussions, we
assume that simple indices on individual media, for example, indices on motion
pictures representing which characters are on the screen and indices on audio
representing which characters are talking, are given in advance.
Generally speaking, we cannot realize the retrieval of video objects as long as
we individually use indices on each medium. Let us consider an example shown in
Fig. 1. The figure depicts a video data indexed with time intervals. The x-axis
Search WWH ::




Custom Search