Database Reference
In-Depth Information
chunks. This allows the data management system to treat access con-
trol in a much more optimistic manner than is possible with traditional
DBMS systems. This feature will be particularly important as data man-
agement systems evolve to take advantage of multicore architectures and
clusters of such multicore computers, where concurrent accesses to data
is a necessity.
More discussion about the differences between scientific and commercial
DBMSs is presented in Section 7.6 in the context of SciDB.
6.3 A Taxonomy of Index Methods
An access method defines a data organization, the data structures, and the
algorithms for accessing individual data items that satisfy some query criteria.
For example, given N records, each with k attributes, one very simple access
method is that of a sequential scan. The records are stored in N consecutive
locations, and for any query the entire set of records is examined one after the
other. For each record, the query condition is evaluated; and if the condition
is satisfied, the record is reported as a hit of the query. The data organization
for such a sequential scan is called the heap . A general strategy to accelerate
this process is to augment the heap with an index scheme .
An index scheme is the data structure and its associated algorithms that im-
prove the data accesses such as insertions, deletions, retrievals, and query pro-
cessing. The usage and preference of an index scheme for accessing a dataset
is highly dependent on a number of factors including the following:
Dataset Size: One factor is whether the data can be contained entirely in
memory or not. Since our focus is on massively large scientific datasets,
we will assume the latter with some consideration for main memory
indexes when necessary.
Data Organization: The datasets may be organized into fixed-size data
blocks (also referred to as data chunks or buckets at times). A data
block is typically defined as a multiple of the physical page size of disk
storage. A data organization may be defined to allow for future inser-
tions and deletions without impacting the speed of accessing data by the
index scheme. On the other hand, the data may be organized and con-
strained to be read-only , append-only , or both. Another influencing data
organization factor is whether the records are of fixed length or variable
length. Of particular interest in scientific datasets are those datasets
that are mapped into very large k-dimensional arrays. To partition the
array into manageable units for transferring between memory and disk
storage, fixed-size subarrays called chunks are used. Examples of such
data organization methods are NetCDF, 61 HDF5 42 and FITS. 41
Search WWH ::




Custom Search