Database Reference
In-Depth Information
consider to query multimedia data is related to using indexing methods specially tailored to efficiently
answer similarity queries.
Indexing methods, such as B-tree and its variations, and hashing structures (a description of these
methods can be found in (Garcia-Molina et al., 2002)) are normally provided by DBMS. However,
although these indexing methods are enough to attend the needs of traditional application users, they
are not suitable for systems that require similarity searches, which usually deal with data that present
high dimensionality and do not present the total ordering property. In order to answer similarity queries
in generic metric spaces the most suitable indexing structures are the Metric Access Methods (MAM).
A MAM is a distance-based indexing method that employs only metrics (such as those described
in the 'The Role of the Distance Function Component' section) to organize the objects in the database.
There are several works presenting MAM proposals in the literature. Among the first proposed, there
were the ones called BK-trees (Burkhard-Keller-trees). The main idea of these structures consists in
choosing an arbitrary central object and applying a distance function to split the remaining objects into
several subsets. In this way, the indexing structure is built recursively, executing this procedure for each
non empty subset. A good overview about these and other indexing structures widely mentioned in the
literature, such as the VP-tree (Vantage Point tree) (Yianilos, 1993), the MVP-tree (Multi-Vantage Point
tree) (Bozkaya and Ozsoyoglu, 1997), the GNAT (Geometric Near-neighbor Access Tree) (Brin, 1995),
the M-tree (Ciaccia et al., 1997) and Slim-tree (Traina-Jr et al., 2002), can be found in (Hjaltason and
Samet, 2003).
Many of the methods mentioned above were developed to improve single-center similarity search
operations (such as k -nearest neighbor and range queries), using the triangle inequality property to prune
branches of the trees. Considering the k -NN queries, for example, there are several approaches proposed
to improve its performance, such as: branch-and-bound (Roussopoulos et al., 1995, Ciaccia et al., 1997,
Hjaltason and Samet, 2003), incremental (Hjaltason and Samet, 1995, Hjaltason and Samet, 1999) and
multi-step algorithms (Korn et al., 1996, Seidl and Kriegel, 1998). Other approaches are to estimate a
final limiting range for the query and to perform a sequence of “start small and grow” steps (Tasan and
Ozsoyoglu, 2004). All of these works refer to algorithms dealing with just one query center.
Regarding the case of multiple query centers, the proposals found in the literature are more recent.
The aggregate range query was first proposed in (Wu et al., 2000) and was used as a relevance feedback
mechanism for content-based image retrieval in a system named Falcon. Considering the aggregate k -
nearest neighbor queries, the first approach appeared in 2005 with the work presented in (Papadias et al.,
2005), which deals only with spatial data. The first strategies proposed considering the case of metric
space appeared more recently with the works presented in (Razente et al., 2008b, Razente et al., 2008a).
Moreover, there are also works considering the improvement of similarity join algorithms. The first
strategies were designed primarily for data in a vector space (Brinkhoff et al., 1993), but others were
developed considering data lying in metric spaces (Dohnal et al., 2003a, Dohnal et al., 2003b, Jacox and
Samet, 2008). The use of this type of query for improving data mining processes has also been explored
in works such as (Bohm and Krebs, 2002, Bohm and Krebs, 2004).
SUPPORTING SIMILARITY QUERIES IN RELATIONAL DBMS
There are several systems developed to query multimedia data by content, including similarity searching.
Many of them are prototypes aimed at testing techniques, without concerns about performance issues.
Search WWH ::




Custom Search