Database Reference
In-Depth Information
7.4.3
Eciency Advantages of Using the BAT Algebra
............
259
7.4.4
Further Improvements
........................................
261
7.4.5
Assessment of the Benefits of Vertical Organization
.........
261
7.5
Experience with SkyServer Data Warehouse Using MonetDB
......
263
7.5.1
Application Description and Planned Experiments
..........
263
......
7.5.2
Ecient Vertical Data Access for Disk-Bound Queries
264
........................................
7.5.3
Improved Performance
265
7.5.4
Reduced Redundancy and Storage Needs
....................
266
7.5.5
Flexibility
.....................................................
267
7.5.6
Use Cases Where Vertical Databases
May Not Be Appropriate
.....................................
267
7.5.7
Conclusions and Future Work
................................
268
7.6
Extremely Large Databases and SciDB
.............................
268
7.6.1
Differences between the Requirements of Scientific and
Commercial Databases
........................................
268
7.6.2
The Array Data Model in SciDB
.............................
269
7.6.2.1
Definition and Creation
..............................
269
7.6.2.2
Operators
............................................
270
7.6.3
Data Overwrite and Provenance
..............................
271
7.6.4
Uncertainty
...................................................
271
7.6.5
Storage Layout
................................................
272
Acknowledgments
..........................................................
272
References
.................................................................
272
7.1 Introduction to Vertical Databases
7.1.1 Basic Concepts
Consider a high-energy physics experiment, where elementary particles are
accelerated to nearly the speed of light and made to collide. These collisions
generate a large number of additional particles. For each collision, called an
event , about 1-10 MB of raw data are collected. The rate of these collisions is
about 10 per second, corresponding with hundreds of millions or a few billion
events per year. Such events are also generated by large-scale simulations.
After the raw data are collected they undergo a reconstruction phase, where
each event is analyzed to determine the particles it produced and to extract
hundreds of summary properties (such as the total energy of the event, mo-
mentum, and number of particles of each type).
To illustrate the concept of vertical versus horizontal organization of data,
consider a dataset of a billion events, each having 200 properties, with values
labeled V 0 , 1 ,V 0 , 2 , and so on. Conceptually, the entire collection of summary
data can be represented as a table with a billion rows and 200 columns as
Search WWH ::




Custom Search