Database Reference
In-Depth Information
The remainder of the paper is structured as follows. A review of the related
work is presented in Section 2. In section 3 we describe the intermediate algo-
rithm HYBRIDJOIN that already uses an index in the join process. In Section 4,
we present the difference between HYBRIDJOIN and X-HYBRIDJOIN, and also
derive the cost model for X-HYBRIDJOIN. The experimental study is discussed
in Section 5 and finally Section 6 concludes the paper.
2 Related Work
In this section, we present an overview of the previous work that has been done
in this area, focusing on that which is closely related to our problem domain.
A novel algorithm Mesh Join (MESHJOIN) [8] [9] has been designed espe-
cially for joining a continuous stream with a disk-based relation, such as the
scenario in active data warehouses. Although it is an adaptive approach, there
are some research issues such as suboptimal distribution of memory among the
join components and an inecient strategy for accessing the disk-based relation.
The algorithm makes no assumptions about data distribution or the organiza-
tion of the master data. However, the problem that we are addressing is that
MESHJOIN's performance is directly coupled to the size of the master data ta-
ble, and its performance is inversely proportional to the size of the master data
table. This is an undesired behavior if the master data becomes very large, and
our analysis will show that it is indeed an unnecessary behavior. The problem
becomes obvious if we consider that the master data table contains a large part
that is never joined with the stream data.
A revised version of MESHJOIN called R-MESHJOIN (reduced Mesh Join)
[12] has been presented by us. It addresses the issue of optimal distribution
of memory among the join components. In this algorithm a new strategy for
memory distribution among the join components is introduced capturing real
constraints. However the issue of an inecient strategy for accessing the disk-
based relation still exists in R-MESHJOIN.
One approach for improving MESHJOIN has been a partition-based join al-
gorithm [10] which can also deal with stream intermittence. It uses a two-level
hash table in order to attempt to join stream tuples as soon as they arrive, and
uses a partition-based waiting area for other stream tuples. For the algorithm in
[10], however, the time that a tuple waits for execution is not bounded. We are,
however, interested in a join approach where the time in which a stream tuple
is joined is guaranteed.
3
Index-Based Hash Join Architecture: HYBRIDJOIN
In this section we introduce the HYBRIDJOIN algorithm, which implements our
first modification of MESHJOIN in order to make use of a non-clustered index.
We introduce the join architecture for HYBRIDJOIN. This will be used, with a
single modification, for the algorithm X-HYBRIDJOIN as well.
 
Search WWH ::




Custom Search