Database Reference
In-Depth Information
relation tables, and perhaps other relation tables as well, a more elaborate
solution is needed, possibly one similar to that used in C-Store to manage
simple-transaction-type queries (see Section 7.3.2). In Cantor, this issue was
handled by designing a separate cache memory for MDB relation tables with
a least recently used (LRU) replacement regime, structured as a collection
of linked lists of metadata records, one for each relation, attribute, and b-
list storage structure. This amounts to adopting an NSM architecture for the
cache, to be used for internal queries only. When a user query involves an
MDB relation, the entire MDB is first “unified,” that is, all updates made to
the cache since the previous unification are written out to the transposed files
used to permanently store the columns of MDB tables.
7.2.6 Querying Compressed, Fully Transposed Files
Stonebraker et al., 8 while describing the architectural properties of the C-Store
system, acknowledge previous work on using compressed data in databases,
stating (p. 564) that “Roth and Van Horn 38 provide an excellent summary
of many of the techniques that have been developed. Our coding schemes are
similar to some of these techniques, all of which are derived from a long history
of work on the topic in the broader field of computer science
Our observa-
tion that it is possible to operate directly on compressed data has been made
before.” 39 Indeed, the capability to represent multiple values in a single field
to simultaneously apply an operation on all the values at once was exploited
earlier in the algorithm for CFTOF search described in Svensson. 16
Batory 15 showed that search algorithms designed for use with transposed
files could outperform commonly used techniques such as the use of inverted
files (indexes) in a large proportion of cases. In Svensson, 16 theoretical and
empirical results were presented, showing that conjunctive queries may be
evaluated even more e ciently if the transposed file structure is combined with
sorting with respect to the primary key followed by column-wise RLE data
compression, forming a CFTOF organization. The performance of a testbed
system was measured and compared with a commercially available database
system and with results from an analytical performance model. The results
showed that order-of-magnitude performance gains could indeed be achieved
by combining transposed file storage and data compression techniques.
In Anderson and Svensson 7 the authors later refined these results by com-
bining interpolation search, sequential search, and binary search into a poly-
algorithm which dynamically selects the appropriate method, given known
metadata. It is shown how this “modified CFTOF interpolation search” sig-
nificantly improves search performance over sequential CFTOF search in crit-
ical cases. The only situation where inverted file range search retains a clear
advantage in a CFTOF-structured database is in highly selective queries over
nonkey attributes. To add the complex and costly machinery of updatable
inverted files to handle that (in a read-optimized database) fairly uncommon
special case seems unwarranted in most analytic DBMS applications.
...
Search WWH ::




Custom Search