Preparing Array Analytics for the Data Tsunami - Geographical Information Systems: Trends and Technologies

Global Positioning System Reference

In-Depth Information

span over several storage media. To establish quick access patterns, arrays

are partitioned in tiles or chunks of convenient size which are basic access

units during query evaluation (Baumann 1994). Additional geo indexes

assist the tiles to act in a well-performing manner.

Array query languages, like SQL, give declarative access to arrays.

These queries are parsed, optimized, and executed to create, manipulate,

search, and delete arrays in fl exible ways. The parser receives the query and

generates the operation tree. Then, algebraic optimization rules are applied

to the query tree where applicable. Without considering the parallelism,

the execution addresses tiles sequentially. The tile-by-tile processing

strategy leads to an architecture allowing servers to process arrays orders

of magnitude far beyond the main memory.

Extensions are made for achieving scalability. Normally, scalability on a

single machine is guided by parameters like the number of processor cores

and the amount of main memory. The trends in processor development are

towards increasing the number of cores in one chip, rather than increasing

the power of a single core. By processing each tile on separate nodes or cores,

parallel processing becomes a critical development paradigm for scalable

software, allowing full utilization of these new processor architectures.

At a certain point, the hardware resources of a single machine will not be

enough to handle all tasks. It becomes necessary to distribute the data and

workload to further machines (nodes) according to some strategy. Multiple

machines present new challenges, however, like limited connection speed

between nodes, optimizing data distribution, minimizing data movement

between nodes. This drives database development towards distributed

and cloud computing. When data duplication would be inevitable with

the standard storage management mechanisms, the in situ processing

capability is an alternative way of adding value to legacy systems and

preserving scalability.

Parallel processing

Parallel databases seek to improve performance by parallelizing all steps

involved in the query evaluation whenever possible. Parallel processing

in the context of Array DBMS specifi cally means data parallelism (Hahn et

al. 2002), which focuses on data distribution across different computing

nodes for parallel evaluation. Pipeline parallelism which is widely used in

RDBMS is not particularly suitable, as the granularity of query evaluation is

much larger with array tiles and a pipeline buffer would quickly overfl ow

the main memory (Hahn et al. 2002).

Parallel DBMS typically exploit one of the following three architectures.

In s hared-memory systems, CPUs are interconnected and have access to a

common memory region. CPUs in a shared-disk architecture have access

Search WWH ::

Custom Search

Home