A Peer-to-Peer Digital Human Life Memory Store for Sharing Serendipitous Moments - Ubiquitous Multimedia Computing

Information Technology Reference

In-Depth Information

The first issue is how to meet the user's needs when they want to retrieve

back their thousands of previous photos or images. Jeon et al. [16] proposed

automatic image annotation and retrieval using a Cross-Media Relevance

Model (CMRM). Nontext media (images, video, and audio) may have little

value if not annotated with additional text. Although through normal text

annotation for images, the process would not be easy and it becomes dif-

ficult to fulfill complex queries. Through automatic image annotation, we

can easily retrieve a particular image. There are two ways the CMRM can be

used. First the blobs corresponding to each test image were used to generate

words and associated probabilities from the joint distribution of blobs and

words, which corresponds to a document-based expansion. Each test image

can be annotated with a vector probability for all of the words in the vocabu-

lary. This is referred to as the Probabilistic Annotation-Based Cross Media

Relevance Model (PACMRM). This model is useful for ranked retrieval, but

is less useful for people to look at. Another method is the Fixed Annotation-

Based Cross-Media Relevance Model (FACMRM). This is not useful for

ranked retrieval but easy for people to use when the number of annotations

is small. Second, a query word (or multiple words) is used to generate a set of

blob probabilities from the joint distribution of blobs and words, correspond-

ing to query expansion. This vector of blob probabilities is compared with

the vector of blobs for each test image using Kullback-Liebler (KL) diver-

gence and the resulting KL distance is used to rank the images. They call

this model the Direct-Retrieval Cross-Media Relevance Model (DRCMRM).

There is room for improvement of this proposed technique in terms of accu-

racy and reliability. The existing automatic image annotation techniques

usually use common words to associate with several different image regions.

As a result, uncommon words have little chance of being used for annotating

images, consequently giving inaccurate results to queries. To resolve this, a

proposed solution is to raise the number of blobs that are associated with

uncommon words. It is also possible to use text anthologies with a combina-

tion of image features to make improvements to the current automatic image

annotation techniques.

Another issue is how to retrieve audio (music, sound, humming, and

voice) from the database. Liu et al. [17] proposed an approach to retrieve

MP3 music objects and voice-based objects on their energy distributions. In

their method, they define an MP3 phase as the logical unit for indexing MP3

objects. It is then segmented into a sequence of MP3 phase units after the

object is inserted into the MP3 music database. They used PCVs (Polyphase-

Filter Bank Coefficient Vectors) as discriminators for each MP3 phase. The

PCVs of an MP3 slot represents the average energy distribution in the 32

sub-band; therefore a certain pitch error can be tolerated. The PCV of an MP3

slot is also designed to identify any sudden change in pitch or volume within

the whole MP3 phase. The MP3 similarity measurement function is used

to retrieve the selected MP3 phases. There are several disadvantages of the

proposed method: only MP3 audio can be tested and not any other type such

Ubiquitous Multimedia Computing

Search WWH ::

Custom Search

Home