Emerging Database Systems in Support of Scientific Data - Scientific Data Management

Database Reference

In-Depth Information

7.2.8 Two Recent Benchmark Studies

In Stonebraker et al., 43 results from a benchmarking study are presented,

and performance comparisons are made between commercial implementations

based on what these authors call “specialized architectures” and conventional

relational databases. The tests involve a range of DBMS applications, includ-

ing both a standard data warehouse benchmark (TPC-H) and several uncon-

ventional ones, namely a text database application, message stream process-

ing, and some computational scientific applications. The “specialized architec-

ture” system used in the data warehouse benchmarks was Vertica, a recently

released parallel multinode, shared-nothing, vertical database product 44 de-

signed along the lines of C-Store. It utilizes a DSM data model, data com-

pression, and sorting/indexing. On these examples, Vertica spent between one

and two orders of magnitude less time than the comparison system, running

in a big and expensive RDBMS installation.

Another database design and benchmarking study using semantic Web text

data was reported in Abadi et al. 45 The vertical database used in this study

was an extension of C-Store capable of dealing with Semantic Web applica-

tions, while the row-store system used for comparison was the open source

RDBMS PostgreSQL, 67 which has been found more ecient when dealing

with sparse data than typical commercial database products (in this appli-

cation, NULL data values are abundant). The authors showed that storing

and processing Semantic Web data in resource description framework (RDF)

format eciently in a conventional RDBMS requires creative representation

of the data in relations. But, more importantly, they showed that RDF data

may be most successfully realized by vertically partitioning the data that obey

logically a fully DSM. The authors demonstrated an average performance ad-

vantage for C-Store of at least an order of magnitude over PostgreSQL, even

when data are structured optimally for the latter system.

7.2.9 Scalability

Over the last decade, the largest data warehouses have increased from 5 to

100 terabytes, and by 2010, most of today's data warehouses may be 10 times

larger than today. Since there are limits to the performance of any individual

processor or disk, all high-performance computers include multiple processors

and disks. Accordingly, a high-performance DBMS must take advantage of

multiple disks and multiple processors. In Dewitt et al., 46 three approaches to

achieving the required scalability are briefly discussed.

In a shared-memory computer system, all processors share a single mem-

ory and a single set of disks. Distributed locking and commit protocols are

not needed, since the lock manager and buffer pool are both stored in the

memory system where they can be accessed by all processors. However,

since all I/O and memory requests have to be transferred over the same

bus that all processors share, the bandwidth of this bus rapidly becomes a

Search WWH ::

Custom Search

Home