Querying Multimedia Data by Similarity in Relational DBMS - Advanced Database Query Systems

Database Reference

In-Depth Information

consider to query multimedia data is related to using indexing methods specially tailored to efficiently

answer similarity queries.

Indexing methods, such as B-tree and its variations, and hashing structures (a description of these

methods can be found in (Garcia-Molina et al., 2002)) are normally provided by DBMS. However,

although these indexing methods are enough to attend the needs of traditional application users, they

are not suitable for systems that require similarity searches, which usually deal with data that present

high dimensionality and do not present the total ordering property. In order to answer similarity queries

in generic metric spaces the most suitable indexing structures are the Metric Access Methods (MAM).

A MAM is a distance-based indexing method that employs only metrics (such as those described

in the 'The Role of the Distance Function Component' section) to organize the objects in the database.

There are several works presenting MAM proposals in the literature. Among the first proposed, there

were the ones called BK-trees (Burkhard-Keller-trees). The main idea of these structures consists in

choosing an arbitrary central object and applying a distance function to split the remaining objects into

several subsets. In this way, the indexing structure is built recursively, executing this procedure for each

non empty subset. A good overview about these and other indexing structures widely mentioned in the

literature, such as the VP-tree (Vantage Point tree) (Yianilos, 1993), the MVP-tree (Multi-Vantage Point

tree) (Bozkaya and Ozsoyoglu, 1997), the GNAT (Geometric Near-neighbor Access Tree) (Brin, 1995),

the M-tree (Ciaccia et al., 1997) and Slim-tree (Traina-Jr et al., 2002), can be found in (Hjaltason and

Samet, 2003).

Many of the methods mentioned above were developed to improve single-center similarity search

operations (such as k -nearest neighbor and range queries), using the triangle inequality property to prune

branches of the trees. Considering the k -NN queries, for example, there are several approaches proposed

to improve its performance, such as: branch-and-bound (Roussopoulos et al., 1995, Ciaccia et al., 1997,

Hjaltason and Samet, 2003), incremental (Hjaltason and Samet, 1995, Hjaltason and Samet, 1999) and

multi-step algorithms (Korn et al., 1996, Seidl and Kriegel, 1998). Other approaches are to estimate a

final limiting range for the query and to perform a sequence of “start small and grow” steps (Tasan and

Ozsoyoglu, 2004). All of these works refer to algorithms dealing with just one query center.

Regarding the case of multiple query centers, the proposals found in the literature are more recent.

The aggregate range query was first proposed in (Wu et al., 2000) and was used as a relevance feedback

mechanism for content-based image retrieval in a system named Falcon. Considering the aggregate k -

nearest neighbor queries, the first approach appeared in 2005 with the work presented in (Papadias et al.,

2005), which deals only with spatial data. The first strategies proposed considering the case of metric

space appeared more recently with the works presented in (Razente et al., 2008b, Razente et al., 2008a).

Moreover, there are also works considering the improvement of similarity join algorithms. The first

strategies were designed primarily for data in a vector space (Brinkhoff et al., 1993), but others were

developed considering data lying in metric spaces (Dohnal et al., 2003a, Dohnal et al., 2003b, Jacox and

Samet, 2008). The use of this type of query for improving data mining processes has also been explored

in works such as (Bohm and Krebs, 2002, Bohm and Krebs, 2004).

SUPPORTING SIMILARITY QUERIES IN RELATIONAL DBMS

There are several systems developed to query multimedia data by content, including similarity searching.

Many of them are prototypes aimed at testing techniques, without concerns about performance issues.

Search WWH ::

Custom Search

Home