Emerging Database Systems in Support of Scientific Data - Scientific Data Management

Database Reference

In-Depth Information

7.4.3

Eciency Advantages of Using the BAT Algebra

............

259

7.4.4

Further Improvements

........................................

261

7.4.5

Assessment of the Benefits of Vertical Organization

.........

261

7.5

Experience with SkyServer Data Warehouse Using MonetDB

......

263

7.5.1

Application Description and Planned Experiments

..........

263

......

7.5.2

Ecient Vertical Data Access for Disk-Bound Queries

264

........................................

7.5.3

Improved Performance

265

7.5.4

Reduced Redundancy and Storage Needs

....................

266

7.5.5

Flexibility

.....................................................

267

7.5.6

Use Cases Where Vertical Databases

May Not Be Appropriate

.....................................

267

7.5.7

Conclusions and Future Work

................................

268

7.6

Extremely Large Databases and SciDB

.............................

268

7.6.1

Differences between the Requirements of Scientific and

Commercial Databases

........................................

268

7.6.2

The Array Data Model in SciDB

.............................

269

7.6.2.1

Definition and Creation

..............................

269

7.6.2.2

Operators

............................................

270

7.6.3

Data Overwrite and Provenance

..............................

271

7.6.4

Uncertainty

...................................................

271

7.6.5

Storage Layout

................................................

272

Acknowledgments

..........................................................

272

References

.................................................................

272

7.1 Introduction to Vertical Databases

7.1.1 Basic Concepts

Consider a high-energy physics experiment, where elementary particles are

accelerated to nearly the speed of light and made to collide. These collisions

generate a large number of additional particles. For each collision, called an

event , about 1-10 MB of raw data are collected. The rate of these collisions is

about 10 per second, corresponding with hundreds of millions or a few billion

events per year. Such events are also generated by large-scale simulations.

After the raw data are collected they undergo a reconstruction phase, where

each event is analyzed to determine the particles it produced and to extract

hundreds of summary properties (such as the total energy of the event, mo-

mentum, and number of particles of each type).

To illustrate the concept of vertical versus horizontal organization of data,

consider a dataset of a billion events, each having 200 properties, with values

labeled V 0 , 1 ,V 0 , 2 , and so on. Conceptually, the entire collection of summary

data can be represented as a table with a billion rows and 200 columns as

Scientific Data Management

Search WWH ::

Custom Search

Home