Databases Reference
In-Depth Information
5.6.2 Ongoing Work
The most obvious approach for creating such a platform is to integrate a relational
database and full-text search index into a single logical product. Practically, all large
database vendors now offer full text search capabilities as part of their database
products, such as Oracle Text or MySQL Fulltext . However, performance still seems
to be an issue in this context since Linping and Lidong [ 27 ] have found in a recent
comparison of simple keyword searches in Lucene and Oracle Text that Lucene is
about an order of magnitude faster as long as the result set does not become too large
(under roughly 1,500 results). Since it is likely that a combination with various join
operations required for API-based retrieval will slow down RDBM-based retrieval
even further it is clear that pure FSTF platforms using the techniques outlined above
will still be superior for some time. An interesting practical alternative is to run a
database alongside an FTSF index on the same corpus of artifacts (as e.g. already
done in the backend of Sourcerer [ 35 ]), with software to automatically keep the
latter up to date with the former. Hibernate Search [ 22 ] is a recently developed open
source framework that achieves this by making text search available on a domain
model stored in a database by the object-relational mapper Hibernate. This offers
the user the best of both worlds in terms of querying options, but at the expense of
supporting two stores of the search base, one in the RDBMs and the other in the
FTSF (i.e. Lucence).
The database part of such a hybrid system can be searched using standard SQL
queries, thus allowing arbitrary structured searches to take advantage of its relational
structure. For example, the new schema for the upcoming version of Merobase, built
using Hibernate Search, is shown in Fig. 5.3 as a UML class diagram. A similar
scheme for the Sourcerer search engine is documented in another chapter of this
topic [ 35 ]. The main advantage of Hibernate Search is that it automatically creates,
and synchronizes, a Lucene full-text search index from the content in the RDBMs.
The overall memory required is obviously greater, but the Lucene index itself is
much leaner than the original and the new version of Merobase therefore supports
all the FTFS capabilities of the original, but allows SQL queries to be applied to the
same search base. The transaction-safe updating capabilities of the RDBS also allow
the content of the search base to be updated much more dynamically by multiple
concurrent crawlers and data mining engines.
Using a relational database it is also possible to support precise searches for
components with particular combinations of properties on-the fly, for example
searching for a specific version of a component that is used as a parameter in a
method. Consider the case of a developer who is faced with the task of learning to
use a particular framework. With an FTFS-based system it is only possible to filter
for the version of the actual search result, i.e. the file that illustrates how to use
a framework. However, searches may often deliver many unsuitable results using
outdated tutorials or documents describing the use of old versions of the framework
containing deprecated methods, or old orchestrations of component that are no
longer applicable (e.g. initialization). A software developer trying to discover how
to use the latest version of a framework would find it extremely helpful to be able
Search WWH ::




Custom Search