Database Reference
In-Depth Information
infrastructure. Each site may share one or more
resources to the grid. Examples of possible shared
resources are storage systems, computer clusters
and supercomputers.
Data warehouses are usually deployed at a single
site. But that may not be the most effective layout
in a grid-based DW implementation. In fact, in such
environment, placing the entire database at a single
site would be more expensive and time consuming
than creating a distributed DW that uses the avail-
able distributed resources to store the database and
to execute users' queries. It is important to consider
that not only users are distributed across distinct
grid sites but also that the warehouse's data may
be loaded from several sites.
Hence, in the distributed Grid-based DW,
data is partitioned and/or replicated at nodes
from distinct sites and may be queried by any
grid participant.
grid-enabled warehouse. The users' local domain
is considered the first tier and stores cached data.
Database servers at remote sites compose the
second tier. The scheduling algorithm tries to
use the locally stored data to answer submitted
queries. If it is not possible, then remote servers
are accessed.
The Globus Toolkit is used by Wehrle et al
(2007) as an underlying infrastructure to imple-
ment a grid-enabled warehouse. Facts table data
is partitioned across nodes participating nodes
and dimension data is replicated. Some special-
ized services are used at each node: (i) an index
service provides information about locally stored
data; and (ii) a communication service is used to
access remote data. Locally stored data is used to
answer incoming queries. If the searched data is
not stored at the local node, then remote access
is done by the use of the communication service.
This strategy and the abovementioned Olap-
enabled strategy do not provide any autonomy
for local domains.
Best-Effort Approaches for
Grid-Enabled Warehouses
Distributed Data Placement
in QoS-Oriented DW
There are some previous works on implementing
and using grid-enabled data warehouses, but most
use best-effort oriented approaches, which may
not be the most adequate approach in grid based
systems (as presented in the previous Section, grid
scheduling is usually satisfaction-oriented).
High availability and high performance are the
main concerns by Costa & Furtado (2006). Each
participating site stores a partitioned copy of the
entire warehouse. Intra-site parallelism is obtained
by the use of the Node Partitioned Data Warehouse
(NPDW) strategy (Furtado, 2004). Hierarchical
scheduler architecture is used together with an
on-demand scheduling policy (idle nodes asks
the Central Scheduler for new queries to execute).
Such model leads to good performance and high
availability, but also consumes too much storage
space, as the whole warehouse is present at each
participating site.
The Olap-enabled grid (Lawrence & Rau-
Chaplin, 2006; Dehne et al, 2007) is a two tier
In data warehouses, users' queries usually follow
some kind of access pattern, like geographically
related ones in which users from a location may
have more interest in data related to such location
than in data about other locations (Deshpande
et al, 1998). That may also be applicable for the
grid. For instance, consider a global organization
that uses a grid-based DW about sales which is
accessed by users from several countries. The users
in New York City, USA, may start querying data
about sales revenue in Manhattan, and then do
continuous drill-up operations in order to obtain
information about sales in New York City, in New
York State and, finally, in the USA. Only rarely
New York users would query data about sales in
France. In the same way, users from Paris may
start querying the database about sales in France,
and then start doing drill-down operations in or-
Search WWH ::




Custom Search