QoS-Oriented Grid-Enabled Data Warehouses - Data Warehousing Design and Advanced Engineering Applications

Database Reference

In-Depth Information

remote data is fetched during job execution. In

the DP strategy, each job is assigned to a node that

stores the job's required input data. Ranganathan

& Foster claim that, in most situations, DP has

better performance than LLS and RS (as doing

data movement across grid's nodes may be very

time consuming).

There are several parameters that should be

considered when scheduling data-centric jobs.

These include the size of the job's input and

output data, and the network bandwidth among

grid's nodes. Park & Kim (2003) present a cost

model that use such parameters to estimate job's

execution time at each node (both considering

that a job can be executed at the submission site

or not, and that it may use local or remote data

as input). Job execution is scheduled to the node

with the lowest predicted execution time.

Although very promising, the grid-enabled

database management systems were not largely

adopted for a long time (Nieto-Santisteban et al,

2005; Watson, 2001). Watson (2001) proposed the

construction of a federated system with the use

of ODBC/JDBC as interface for heterogeneous

database systems. In more recent work, web ser-

vices are used as interface to database management

systems. Alpdemir et al (2003) present an Open

Grid Services Architecture [OGSA - (Foster et al,

2002)]-compatible implementation of a distributed

query processor (Polar*). A distributed query

execution plan is constructed by basic operations

that are executed at several nodes.

Costa & Furtado (2008c) compares the use

of centralized and hierarchical query schedul-

ing strategies in grid-enabled databases. The

authors present that hierarchical schedulers can

be used without significant lose in the system's

performance and can also lead to good levels of

achievement of Service Level Objectives (SLOs).

In Costa & Furtado (2008b) the authors propose

the use of reputation systems to schedule deadline-

marked queries among grid-enabled databases

when several replicas of the same data are present

at distinct sites.

In Grids, data replicas are commonly used to

improve job (or query) execution performance

and data availability. Best Client and Cascading

Replication are among the dynamic file replica-

tion strategies evaluated by Ranganathan & Foster

(2001) to be used in the Grid. In both models, a

new file replica is created whenever the number

of access to an existent data file is greater than a

threshold value. The difference among the meth-

ods resides on where such new file is placed. The

'best client' of a certain data file is defined as the

node that has requested for each more times in a

certain time period. In the Best Client placement

strategy, the new replica is placed at the best cli-

ent node. In the Cascading Replication method,

the new file is placed at the first node in the path

between the node that stores the file that is being

replicated and the best client node.

The Best Client strategy is used as an inspi-

ration for the Best Replica Site strategy [(Siva

Sathya et al, 2006)]. The main different among

the this strategy and the original Best Client is

that in Best Replica Site the site in which the

replica is created is chosen considering not only

the number of access from clients to the dataset,

but also the replica's expected utility for each site

and the distance between sites. Sathya et al (2006)

also propose two other strategies: Cost Effective

Replication and Topology Based Replication . In

the first one, a cost function is used to choose in

which site a replica should be created (the cost

function evaluates the cost of accessing a replica

at each site). In the latter, database replicas are

created at the node that has the greatest number

of direct connections to other ones.

Topology related aspects are also considered

by Lin et al (2006) in order to choose replica loca-

tion. The authors consider a hierarchical (tree-like)

grid in which database is placed at the tree root.

Whenever a job is submitted, the scheduler looks

for the accessed data at the node in which the job

was submitted. If the necessary data is not at such

node, then the schedulers asks for it at the node's

parent node. If the parent node does not have a

Data Warehousing Design and Advanced Engineering Applications

Search WWH ::

Custom Search

Home