Querying Multimedia Data by Similarity in Relational DBMS - Advanced Database Query Systems

Database Reference

In-Depth Information

PERFORMING SIMILARITY QUERIES USING DBMS

Besides including new constructions in the SQL language to represent similarity queries, it is also needed

to deal with adequate techniques to store multimedia data and algorithms to execute similarity queries

over them. This section describes two case studies that developed experimental systems to tackle such

requirements on existing DBMS.

Both of them are open source software and can be enhanced with domain-specific feature extractors

and distance functions. The first one ('A Blade for Similarity Query Processing' section) is implemented

as a intermediate layer over the DBMS interface that recognizes the SQL extension for similarity de-

scribed in the 'A SQL Extension for Similarity Data Management' section. It is conceived as a layer to be

DBMS-independent, since it introduces new logic to the query processor. It allows managing similarity

data integrated with simple data in a plain and transparent way and executes the similarity operations

using efficient algorithms. The second system ('A DBMS Module for Similarity Searching' section) is

implemented as a module attached to the DBMS core. It is tightly coupled to the DBMS query proces-

sor and also executes similarity operations efficiently. However, as it does not interpret the extended

commands, it relies on the verbose standard SQL syntax and is limited to the built-in query rewrite rules.

A Blade for Similarity Query Processing

This section presents an academic open source engine, called SIREN (SImilarity Retrieval ENgine)

(Barioni et al., 2006), which was developed intending both to validate the extension of the SQL presented

in the previous section and to explore the issues related with supporting similarity queries from inside

SQL in a native form, which is important to allow optimizing the full set of search operations involved

in each query posed.

SIREN acts like a blade between a conventional DBMS and the application programs intercepting

every SQL command sent from the application. If it has no similarity construction nor a reference to

complex objects, it sends the unchanged command to the underlying DBMS and relays the answer

from the DBMS to the application program. Therefore, when only conventional commands are posed

by the application, SIREN is transparent. When the SQL command has similarity-related constructions

or references to complex data, the command is re-written, the similarity-related operations are executed

internally, and the underlying DBMS is used only to execute the conventional data operations.

Multimedia data types are stored as Binary Large Objects (BLOB data types) as well as their extracted

features. Feature extraction is usually costly, but it must be executed for each object once, when the

object is stored in the database. As the user does not provide places in the relations to store the extracted

attributes, the system provides their storage and association to the multimedia objects in a transparent

way to the user.

To store a complex object along with its extracted feature vectors, SIREN changes the definition of user

defined tables that have the complex attributes as follows. Each complex attribute (e.g. a STILLIMAGE

column) is changed to a reference to a system-controlled table that has as its attributes both the object's

binary data and the attributes that store all features gotten by every extractor used in each metric as-

sociated with the attribute. A new table is created for each complex attribute. Whenever a new image

is stored in the database, SIREN intercepts the INSERT command, stores the non-image attributes in

the user table and the images in the corresponding system tables. Thereafter, SIREN calls the feature

extractors and stores their outputs in the corresponding system tables. Whenever the user asks for data

Search WWH ::

Custom Search

Home