Querying Multimedia Data by Similarity in Relational DBMS - Advanced Database Query Systems

Database Reference

In-Depth Information

cept of dimensions do not apply (as for example shapes defined by polygons with distinct number of

vertices). Formally, a metric space is a pair ,d , where  is the set of all objects complying with the

properties of the domain and δ is a distance function that complies with the following three properties:

symmetry : δ( s 1 ,s 2 ) = δ( s 2 ,s 1 ); non-negativity : 0 < δ( s 1 ,s 2 ) < ∞ if s 1 ≠ s 2 and δ( s 1 ,s 1 ) = 0; and triangular

inequality : δ( s 1 ,s 2 ) ≤ δ( s 1 ,s 3 ) + δ( s 3 ,s 2 ), ∀

, , . A function that satisfies these properties is called

a metric . The Minkowski distances with p ≥ 1 are metrics, therefore vector spaces ruled by any of such

functions are special cases of metric spaces. Another important property of metric spaces is that they

allow developing fast indexing structures (see 'Indexing Methods for Multimedia' section). Other ex-

amples of metrics are the Canberra distance (Kokare et al., 2003) and the Weak Attribute Interaction

Distance (WAID), which allows users to define the influence between features according to their percep-

tion (Felipe et al., 2009).

Distance functions can be affected by weighting techniques, producing distinct similarity space in-

stances and tuning the evaluation. These techniques can be classified in: feature weighting and partial

distance weighting. Feature weighting has the goal of establishing the ideal balance among the relevance

of each feature for the similarity that best satisfies the user needs. The trivial strategy for weighting

features is based on exhaustive experimental evaluation. Nonetheless, there is an increasing number of

approaches dynamically guided by information provided in the query formulation and/or in relevance

feedback cycles (Liu et al., 2007, Wan and Liu, 2006, Lee and Street, 2002). Partial distance weighting

is employed when an object is represented by many feature vectors and the similarity evaluation between

two objects first computes the (partial) distance between each feature vector, usually employing distinct

distance functions, and then uses another function to aggregate these values to calculate the final distance.

The automatic partial distance weighting methods can be classified into supervised (e.g. (Bustos et al.,

2004)) and unsupervised (e.g. (Bueno et al., 2009)).

Now that we already know how to represent and compare the similarity of multimedia objects, it is

time to learn how to query these data. There are several types of similarity queries that can be employed

to query multimedia data. These types of queries are discussed in the next section.

s s s

∈

Similarity Queries

Let us remember a few fundamental concepts of the relational model to provide a proper definition of

similarity queries following the database theory. It is worth to stress that every traditional concept of the

relational model remains valid when retrieving multimedia objects by the similarity of their contents.

Suppose R is a relation with n attributes described by a relational schema R = (

, ... ,

)

1 , composed

of a set of m tuples t i , such that R = {t 1 , …, t m } . Each attribute S j , 1 ≤ j ≤ n , indicates a role for domain

 j , that is S j

S n

Ì  . Therefore, when  j is the multimedia domain from a metric space, each attribute

S j stores multimedia values. Each tuple of the relation stores one value for each attribute S j , where each

value s i , 1 ≤ i ≤ m , assigned to S j is an element taken from domain  j and the dataset S j is composed of

the set of elements s i that are assigned to the attribute S j in at least one tuple of the stored relation. Notice

that more than one attribute S j , S k from R can share the same domain, that is, it is possible to have  j

=  k . Regarding the multimedia domain from a metric space, the elements must be compared by simi-

larity, using a distance function δ defined over the respective domain. Elements can be compared using

the properties of the domain, regardless of the attributes that store the elements. Therefore, every pair

Advanced Database Query Systems

Search WWH ::

Custom Search

Home