Emerging Database Systems in Support of Scientific Data - Scientific Data Management

Database Reference

In-Depth Information

replicated tables. For example, if all the columns in a query are indexed, the

query can be substantially sped up by scanning the shorter index records in-

stead of touching the wide records of the main table. We illustrate this with

the next query example. It extracts celestial objects that are low-z quasar

candidates, a property specified through correlations between the objects'

magnitudes in different color bands (query SX11 in Gray et al. 58 ).

SELECT g, run, rerun, camcol, field, objID

}

FROM Galaxy

WHERE ( (g < = 22)

and (u

−

g > =

−

0.27) and (u

−

g < 0.71)

and (g

−

r > =

−

0.24) and (g

−

r < 0.35)

and (r

−

i > =

−

0.27) and (r

−

i < 0.57)

z < 0.70) )

The query predicates do not allow ecient index search of the qualifying

rows; instead scanning of all the rows is needed. However, a full table scan

can be avoided using available indexes that contain all the necessary columns.

The data volume transferred for the 150 GB dataset is 1.8 GB, a substantial

reduction with respect to the full table scan, but still twice as large as the

850 MB transferred in MonetDB for the same query. The reason is that the

indexes chosen for the query execution contain several additional columns

irrelevant for this query.

and (i

−

z > =

−

0.35) and (i

−

7.5.3 Improved Performance

In addition to the ecient vertical access pattern, MonetDB employs a number

of techniques to provide high performance for analytical applications. Among

these are runtime optimization, such as choosing the best algorithm fitting

the argument properties, and ecient cache-conscious algorithms exploiting

modern computer architecture. To demonstrate the net effect of these tech-

niques on the performance experienced by the end user, we performed a few

experiments with the above table- and index-scan queries against both the

1.5 GB and 150 GB datasets. The elapsed times in seconds are shown in

Table 7.1. The performance of the vertical database for index-supported

queries is comparable for the small dataset, and 30% better for the large

dataset. Queries involving full table scans are sped up by a factor of 5 for the

large dataset.

TABLE 7.1 Elapsed times in seconds for two types of queries against

a “small” (1.5 GB) and a “large” (150 GB) dataset

Table Scan

Index Scan

Table Scan

Index Scan

1.5 GB

150 GB

Row-store

6.6

0.4

245

24

Column-store

0.4

0.47

53

16

Scientific Data Management

Search WWH ::

Custom Search

Home