Database Reference
In-Depth Information
tions from distinct domains (Nieto-Santisteban et
al, 2005; Watson, 2001).
On the other hand, data warehouses are mostly
read-only databases which store historical data
that is commonly used for decision support and
knowledge discovery (Chaudhuri & Dayal, 1997).
Grid-based data warehouses are useful in many
real and virtual global organizations which are
generating huge volumes of distributed data.
In such context, the data warehouse is a highly
distributed database whose data may be loaded
from distinct sites and that should be transparently
queried by users from distinct domains.
But constructing effective grid-based applica-
tions is not simple. Grids are usually very hetero-
geneous environments composed by resources that
may belong to distinct organization domains. Each
domain administrator may have a certain degree
of autonomy and impose local resource usage
constraints for remote users (Foster, 2001).
Such site autonomy is reflected in terms of
scheduling algorithms and scheduler architec-
tures. The hierarchical architecture is one of the
most commonly used scheduling architecture in
Grids (Krauter et al, 2002). In such architecture,
a Community Scheduler (or Resource Broker) is
responsible to transform submitted jobs into tasks
and to assign them to sites for execution. At each
site, a Local Scheduler is used to manage local
queues and implement local domain scheduling
policies. Such architecture enables a certain degree
of site autonomy.
Besides that, in Grids, tasks are usually speci-
fied together with Service Level Objectives (SLO)
or Quality-of-Service (QoS) requirements. In fact,
in many Grid systems, scheduling is QoS-oriented
instead of performance-oriented (Roy & Sander,
2004). In such situations, the main objective is
to increase user's satisfaction instead of achiev-
ing high performance. Hence, the user-specified
SLOs may be used by the Community Scheduler
to negotiate with Local Schedulers the establish-
ment of Service Level Agreements (SLA). But
SLOs can also be used to provide some kind of
differentiation among users or jobs. Execution
deadline and execution cost's limit are some
example of commonly used SLOs.
We consider here the use of deadline-marked
queries in grid-based Data Warehouses. In such
context, execution time objectives can provide
some differentiation between interactive queries
and report queries. For example, one can establish
that interactive queries should be executed by a 20
seconds deadline and that report queries should be
executed in 5 minutes. In fact, different deadlines
may be specified considering several alternatives,
like the creation of privileged groups of users that
should obtain responses in lower times or like
providing smaller deadlines for queries submitted
by users affiliated to institutions that had offered
more resources to the considered grid-based data
warehouse.
Data placement is a key issue in grid-based
applications. Due to the grid's heterogeneity and
to the high cost of moving data across different
sites, data replication is commonly used to improve
performance and availability (Ranganathan &
Foster, 2004). But most of the works on replica
selection and creation in data grids consider ge-
neric file replication [e.g. (Lin et al, 2006; Siva
Sathya et al, 2006; Haddad & Slimani, 2007)].
Therefore, the use of specialized data placement
strategies for the deployment of data warehouses
in grids still remains an open issue.
In this chapter, we discuss the implementation
of QoS-oriented Grid-enabled Data Warehouses.
The grid-enabled DW is composed by a set of
grid-enabled database management systems, a set
of tools provided by an underlying grid resource
management (GRM) system and hierarchical
schedulers. We combine data partitioning and
replication, constructing a highly distributed da-
tabase that is stored across grid's sites, and use a
QoS-oriented scheduling and a specialized replica
selection and placement strategy to achieve high
QoS levels.
This chapter is organized as follows: in the
next Section we present some background on data
Search WWH ::




Custom Search