Database Reference
In-Depth Information
PERFORMING SIMILARITY QUERIES USING DBMS
Besides including new constructions in the SQL language to represent similarity queries, it is also needed
to deal with adequate techniques to store multimedia data and algorithms to execute similarity queries
over them. This section describes two case studies that developed experimental systems to tackle such
requirements on existing DBMS.
Both of them are open source software and can be enhanced with domain-specific feature extractors
and distance functions. The first one ('A Blade for Similarity Query Processing' section) is implemented
as a intermediate layer over the DBMS interface that recognizes the SQL extension for similarity de-
scribed in the 'A SQL Extension for Similarity Data Management' section. It is conceived as a layer to be
DBMS-independent, since it introduces new logic to the query processor. It allows managing similarity
data integrated with simple data in a plain and transparent way and executes the similarity operations
using efficient algorithms. The second system ('A DBMS Module for Similarity Searching' section) is
implemented as a module attached to the DBMS core. It is tightly coupled to the DBMS query proces-
sor and also executes similarity operations efficiently. However, as it does not interpret the extended
commands, it relies on the verbose standard SQL syntax and is limited to the built-in query rewrite rules.
A Blade for Similarity Query Processing
This section presents an academic open source engine, called SIREN (SImilarity Retrieval ENgine)
(Barioni et al., 2006), which was developed intending both to validate the extension of the SQL presented
in the previous section and to explore the issues related with supporting similarity queries from inside
SQL in a native form, which is important to allow optimizing the full set of search operations involved
in each query posed.
SIREN acts like a blade between a conventional DBMS and the application programs intercepting
every SQL command sent from the application. If it has no similarity construction nor a reference to
complex objects, it sends the unchanged command to the underlying DBMS and relays the answer
from the DBMS to the application program. Therefore, when only conventional commands are posed
by the application, SIREN is transparent. When the SQL command has similarity-related constructions
or references to complex data, the command is re-written, the similarity-related operations are executed
internally, and the underlying DBMS is used only to execute the conventional data operations.
Multimedia data types are stored as Binary Large Objects (BLOB data types) as well as their extracted
features. Feature extraction is usually costly, but it must be executed for each object once, when the
object is stored in the database. As the user does not provide places in the relations to store the extracted
attributes, the system provides their storage and association to the multimedia objects in a transparent
way to the user.
To store a complex object along with its extracted feature vectors, SIREN changes the definition of user
defined tables that have the complex attributes as follows. Each complex attribute (e.g. a STILLIMAGE
column) is changed to a reference to a system-controlled table that has as its attributes both the object's
binary data and the attributes that store all features gotten by every extractor used in each metric as-
sociated with the attribute. A new table is created for each complex attribute. Whenever a new image
is stored in the database, SIREN intercepts the INSERT command, stores the non-image attributes in
the user table and the images in the corresponding system tables. Thereafter, SIREN calls the feature
extractors and stores their outputs in the corresponding system tables. Whenever the user asks for data
Search WWH ::




Custom Search