Database Reference
In-Depth Information
evaluated site (at this point we refer that replicas
of input datasets are created. Latter in this chapter,
we discuss how to determine if such datasets are
fact's table fragments or computed chunks).
In order to implement such policy, when the
Local Scheduler cannot execute the task (query)
by the specified mlet time, it should evaluate if it
would achieve the specified deadline if a replica
of a given dataset is present at the site.
When the Local Scheduler predicts that it
would achieve the required task's deadline if a
certain replica is locally stored, it informs the
Replication Manager what is such replica. In such
situation, the value of β for the specified dataset
is incremented by a certain δ value (benefit of the
considered input data set replica to the system), as
represented in Equation 5. However, the δ value
of each task should vary over time, in order to
differentiate the benefit for old queries from the
ones for newer queries. Therefore, a time discount
function may be used in order to compute δ , as
presented in Equation 6.
computed in the same way that β and δ values of
inexistent replicas are computed. If the β value
of an existing replica is lower than the one of a
replica candidate, then a replica replacement is
done. Otherwise, the system maintains the already
existing replicas.
LOCAL CACHING AND INTRA-
SITE REPLICA CANDIDATES
Facts' table fragments are natural candidates for
inter-site replication, as discussed previously in
this Chapter. When a Local Scheduler evaluates
that a certain deadline would be achieved if its
site stores a local copy of a certain fragment, it
informs the Replication Manager which consid-
ers such fragment as a dataset that is candidate
for inter-site replication. Such fragment may be
replicated or not depending on its benefit for the
system.
As discussed before, each site is autonomous
to implement its own data placement (and repli-
cation) strategy. Besides that, each site may also
implement its own data caching mechanism. There
are some caching mechanisms that are benefited
by the multidimensional nature of warehouse data.
Chunked-based caching (Deshpande et al, 1998;
Deshpande & Naughton, 2000) is one of those
specialized mechanisms for the DW.
In chunk-based caching, DW data to be stored
in the cache is broken up into chunks that are
cached and used to answer incoming queries.
The list of necessary chunks to answer a query is
broken into two: (i) chunks that may be obtained
(or computed) from cached data; and (ii) chunks
that have to be recovered from the data warehouse
database. In such method, sometimes it is possible
to compute a chunk from chunks from different
levels of aggregation (each aggregation level cor-
responds to a group-by operation) (Deshpande &
Naughton, 2000).
Such computed chunk based mechanism may
be implemented by local schedulers to implement
= å i
i
b
d
(5)
- æ
è
ö
ø
D
t
÷ ÷ ÷ ÷ ÷
ç ç ç ç
(6)
l
d
=
e
i
In Equation 6, Δt represents the time window
between the task execution time (of the query that
would be benefited by the input dataset replica-
tion) and current time and λ enables the use of
different time intervals [as defined in (Huynh et
al, 2006)].
Whenever β is greater than a threshold value
for a certain dataset/site, the site is marked as a
candidate to receive a replica of the considered
dataset. Indeed, the replica is immediately cre-
ated if there is enough disk space. Otherwise,
the system would have to evaluate if some of the
existent data replicas (of another datasets) should
be replaced or not by the new replica candidate. In
order to do that, RM also maintains the benefits
score of existing dataset replicas. Such score is
Search WWH ::




Custom Search