Database Reference
In-Depth Information
7.2.8 Two Recent Benchmark Studies
In Stonebraker et al., 43 results from a benchmarking study are presented,
and performance comparisons are made between commercial implementations
based on what these authors call “specialized architectures” and conventional
relational databases. The tests involve a range of DBMS applications, includ-
ing both a standard data warehouse benchmark (TPC-H) and several uncon-
ventional ones, namely a text database application, message stream process-
ing, and some computational scientific applications. The “specialized architec-
ture” system used in the data warehouse benchmarks was Vertica, a recently
released parallel multinode, shared-nothing, vertical database product 44 de-
signed along the lines of C-Store. It utilizes a DSM data model, data com-
pression, and sorting/indexing. On these examples, Vertica spent between one
and two orders of magnitude less time than the comparison system, running
in a big and expensive RDBMS installation.
Another database design and benchmarking study using semantic Web text
data was reported in Abadi et al. 45 The vertical database used in this study
was an extension of C-Store capable of dealing with Semantic Web applica-
tions, while the row-store system used for comparison was the open source
RDBMS PostgreSQL, 67 which has been found more ecient when dealing
with sparse data than typical commercial database products (in this appli-
cation, NULL data values are abundant). The authors showed that storing
and processing Semantic Web data in resource description framework (RDF)
format eciently in a conventional RDBMS requires creative representation
of the data in relations. But, more importantly, they showed that RDF data
may be most successfully realized by vertically partitioning the data that obey
logically a fully DSM. The authors demonstrated an average performance ad-
vantage for C-Store of at least an order of magnitude over PostgreSQL, even
when data are structured optimally for the latter system.
7.2.9 Scalability
Over the last decade, the largest data warehouses have increased from 5 to
100 terabytes, and by 2010, most of today's data warehouses may be 10 times
larger than today. Since there are limits to the performance of any individual
processor or disk, all high-performance computers include multiple processors
and disks. Accordingly, a high-performance DBMS must take advantage of
multiple disks and multiple processors. In Dewitt et al., 46 three approaches to
achieving the required scalability are briefly discussed.
In a shared-memory computer system, all processors share a single mem-
ory and a single set of disks. Distributed locking and commit protocols are
not needed, since the lock manager and buffer pool are both stored in the
memory system where they can be accessed by all processors. However,
since all I/O and memory requests have to be transferred over the same
bus that all processors share, the bandwidth of this bus rapidly becomes a
Search WWH ::




Custom Search