Database Reference
In-Depth Information
der to obtain data about sales in Paris and, then,
individually on each of its arrondissements .
In order to reduce data movement across sites
(improving the system's performance) the grid-
based DW tables are physically distributed across
different sites. Such distribution is represented at a
Global Physical Schema (GPS). Ideally, the physi-
cally used data distribution strategy is transparent
to users, which should submit queries considering
a unified Logical Model (LM).
Grids are highly heterogeneous environments.
At each site, different types of resources may be
available (like shared-nothing and shared-disk
parallel machines, for example). It is somewhat
difficult to find an intra-site allocation strategy
that is optimal in the several possible situations.
Therefore, each site may use its own local physi-
cal allocation strategy (e.g. Multi-Dimensional
Hierarchical Fragmentation - MDHF (Stöhr et
al, 2000) or Node-Partitioned Data Warehouse
strategy - NPDW (Furtado, 2004). Each site's
existent relations are represented in a Local Site
Physical Schema (LSPS). This assumption fits
well with the idea of domain autonomy, which is
one of the grid's characteristics.
In the generic grid-based DW, nodes from
any site can load data to the database. But the
same data cannot be loaded from distinct sites.
This leads to the idea that each piece of data has
a single site (to which we call Data Source Site)
that is its primary source. In order to reduce data
movement across grid's sites (considering the
abovementioned geographically related access
patters), each site should maintain a copy of the
facts data it has loaded into the DW (in this chapter,
we consider that tables in the LM are organized in
a star schema). This generates a globally physically
partitioned facts table which uses the values of a
site source attribute as partitioning criteria.
Depending on the implementation, the site
source attribute values may be combined with
values of other existent dimensions. In fact, the
repartitioning of each facts table site source-based
fragment into several smaller fragments can
benefit the system in several ways. For instance,
in such situation, each smaller fragment can be
replicated to distinct sites, what would increase
the system's degree of parallelism. Besides that,
depending on the selection predicate, some queries
may access only a set of the smaller fragments,
which would be faster than accessing the whole
original site source-based fragment. These two
situations are represented in Figure 2. Therefore,
even at the global level, other partitioning criteria
should be used together with the site source at-
tribute. The use of the most frequently used equi-
join attributes as part of the partitioning criteria
for the facts table can improve performance, by
reducing data movement across sites when execut-
ing queries [as it does in shared-nothing parallel
machines (Furtado, 2004b)].
Besides facts table's partitions, each site should
also store dimension tables' data. Full replication
of dimension tables across all sites may be done
to reduce inter-site data movement during query
execution and to improve data availability. Such
strategy is feasible when dimension tables are small
(this also facilitates system management). But when
large dimension tables are present, they can be
fragmented both at intra-site and inter-sites levels
in order to improve performance and QoS-levels.
Intra-site dimension table fragmentation strategy
depends on the locally chosen physical allocation
strategy (which is dependent on the type of locally
available resources, as discussed earlier). Inter-sites
large dimension tables' fragmentation should be
done using a strategy similar to the one of facts
table fragmentation: initially, dimension data should
remain at its Data Source Site . Inter-site replica-
tion is done when necessary. Derived partitioning
of the facts table can also be done, improving the
system's performance as join operations can be
broken into subjoins that are executed in parallel
at distinct sites. Although the use of facts table
derived partitioning depends on the semantics of
stored data, such kind of partitioning should be
used together use the aforementioned partitioning
based on the site source attribute.
Search WWH ::




Custom Search