Databases Reference
In-Depth Information
of r i is r i− 1 and the right neighbor is r i +1 , for an insertion the owner has to
compute signatures
r i +1 ).
For k consecutive updates in the best case a total of k + 2 signature compu-
tations are required for insertions and 1 for deletions if the deleted tuples are
consecutive. In the worst case a total of 2 k signature computations are needed
for insertions and k for deletions, if no two tuples are consecutive. Given k
updates, suppose the expected number of signatures to be computed is repre-
sented by E
S
( r i− 1 |
r i )and
S
( r i |
r i +1 ), and for a deletion
S
( r i− 1 |
2 k ). Then the additional I/O incurred is equal
to E { k }·| s P , excluding the I/Os incurred for updating the B + -tree structure.
Since the cost of signature computations is larger than even the I/O cost of
random disk accesses, a large number of updates is expected to have a very
expensive updating cost. The total update cost for the ASB-tree is:
{
k
}
( k
E
{
k
}≤
}·C s + E
{
k
}·|
s
|
a
C
u = E
{
k
·C IO .
(19)
P
The Merkle B-tree
The MB-tree can support ecient updates since only hash values are stored
for the records in the tree and, first, hashing is orders of magnitude faster
then signing, second, for each tuple only the path from the affected leaf to
the root need to be updated. Hence, the cost of updating a single record is
dominated by the cost of I/Os. Assuming that no reorganization to the tree
occurs the cost of an insertion is
u =
S |h| .
In realistic scenarios though one expects that a large number of updates
will occur at the same time. In other cases the owner may decide to do a
delayed batch processing of updates as soon as enough changes to the database
have occurred. The naive approach for handling batch updates would be to do
all updates to the MB-tree one by one and update the path from the leaves
to the root once per update. Nevertheless, in case that a large number of
updates affect a similar set of nodes (e.g., the same leaf) a per tuple updating
policy performs an unnecessary number of hash function computations on
the predecessor path. In such cases, the computation cost can be reduced
significantly by recomputing the hashes of all affected nodes only once, after
all the updates have been performed on the tree. A similar analysis holds for
the incurred I/O as well.
Clearly, the total update cost for the per tuple update approach for k
insertions is k
C
H |r| + d m (
H f m |h| +
C IO )+
u which is linear to the number of affected nodes k
d m .
The expected cost of k updates using batch processing can be computed
as follows. Given k updates to the MB-tree, assuming that all tuples are
updated uniformly at random and using a standard balls and bins argu-
ment, the probability that leaf node X has been affected at least once is
P ( X )=1
·C
·
1
f d m 1
m
) k
(1
and the expected number of leaf nodes that have
been affected is f d m 1
m
P ( X ). Using the same argument, the expected num-
ber of nodes at level i (where i = 1 is the leaf level and 1
·
i
d m )is
 
Search WWH ::




Custom Search