Information Technology Reference
In-Depth Information
The first issue is how to meet the user's needs when they want to retrieve
back their thousands of previous photos or images. Jeon et al. [16] proposed
automatic image annotation and retrieval using a Cross-Media Relevance
Model (CMRM). Nontext media (images, video, and audio) may have little
value if not annotated with additional text. Although through normal text
annotation for images, the process would not be easy and it becomes dif-
ficult to fulfill complex queries. Through automatic image annotation, we
can easily retrieve a particular image. There are two ways the CMRM can be
used. First the blobs corresponding to each test image were used to generate
words and associated probabilities from the joint distribution of blobs and
words, which corresponds to a document-based expansion. Each test image
can be annotated with a vector probability for all of the words in the vocabu-
lary. This is referred to as the Probabilistic Annotation-Based Cross Media
Relevance Model (PACMRM). This model is useful for ranked retrieval, but
is less useful for people to look at. Another method is the Fixed Annotation-
Based Cross-Media Relevance Model (FACMRM). This is not useful for
ranked retrieval but easy for people to use when the number of annotations
is small. Second, a query word (or multiple words) is used to generate a set of
blob probabilities from the joint distribution of blobs and words, correspond-
ing to query expansion. This vector of blob probabilities is compared with
the vector of blobs for each test image using Kullback-Liebler (KL) diver-
gence and the resulting KL distance is used to rank the images. They call
this model the Direct-Retrieval Cross-Media Relevance Model (DRCMRM).
There is room for improvement of this proposed technique in terms of accu-
racy and reliability. The existing automatic image annotation techniques
usually use common words to associate with several different image regions.
As a result, uncommon words have little chance of being used for annotating
images, consequently giving inaccurate results to queries. To resolve this, a
proposed solution is to raise the number of blobs that are associated with
uncommon words. It is also possible to use text anthologies with a combina-
tion of image features to make improvements to the current automatic image
annotation techniques.
Another issue is how to retrieve audio (music, sound, humming, and
voice) from the database. Liu et al. [17] proposed an approach to retrieve
MP3 music objects and voice-based objects on their energy distributions. In
their method, they define an MP3 phase as the logical unit for indexing MP3
objects. It is then segmented into a sequence of MP3 phase units after the
object is inserted into the MP3 music database. They used PCVs (Polyphase-
Filter Bank Coefficient Vectors) as discriminators for each MP3 phase. The
PCVs of an MP3 slot represents the average energy distribution in the 32
sub-band; therefore a certain pitch error can be tolerated. The PCV of an MP3
slot is also designed to identify any sudden change in pitch or volume within
the whole MP3 phase. The MP3 similarity measurement function is used
to retrieve the selected MP3 phases. There are several disadvantages of the
proposed method: only MP3 audio can be tested and not any other type such
Search WWH ::




Custom Search