Information Technology Reference
In-Depth Information
13.5
Recovering a Server from a Failure
Server s
i
recovers from a failure by executing the analysis, redo, and undo passes
of the
ARIES
recovery algorithm. Only minor modifications are needed to the basic
(centralized-database) recovery algorithm so as to handle distributed transactions.
In the analysis pass, the log at server s
i
is scanned from the begin-checkpoint log
record for the last completely taken checkpoint to the end of the log, reconstructing
the active-transaction and modified-page tables. The actions performed during the
scan are as in the analysis pass for centralized database recovery (Algorithm
4.12
),
except that the procedure
analyze-log-record
./ (Algorithm
4.13
) must also observe
log records of the forms (
13.1
)-(
13.4
). For (
13.1
), the server identifier s
i
is inserted
to the list L of participants in the active-transaction table being reconstructed.
For (
13.2
) with
LSN
n, the record .T
i
;s
0
;T
0
;n/is inserted into the active-transaction
table, and for (
13.3
)and(
13.4
), the state of the transaction is changed to “prepared-
to-commit.”
The redo pass of
ARIES
recovery at server s
i
is performed exactly in the same
way as in a centralized database system (Algorithm
4.14
).
In the undo pass of
ARIES
recovery (Algorithm
4.16
)atservers
i
, the set of active
transactions is now partitioned into five groups:
1. Not-prepared-to-commit
forward-rolling
transactions
coordinated
by
the
server s
i
2. Not-prepared-to-commit forward-rolling subtransactions of transactions coordi-
nated by servers other than s
i
3. Prepared-to-commit transactions coordinated by server s
i
4. Prepared-to-commit subtransactions of transactions coordinated by servers other
than s
i
5. Aborted transactions
The transactions in groups 1-3 are aborted and rolled back, and the rollback
of the transactions in group 5 is run into completion. Note that the transactions in
groups 1 and 2 are not yet prepared to commit; thus server s
i
can abort them by
a unilateral decision. As regards transactions in group 3 (or in group 1), s
i
is their
coordinator server and thus is entitled to decide about the fate of the transactions.
The transactions in group 4 are termed
in-doubt transactions
: their fate cannot
be deduced from the information surviving in the log at server s
i
. The parent of an
in-doubt transaction may have committed or aborted at the coordinator, or another
subtransaction of the same distributed transaction may have aborted at some other
participant. Only when the commit/abort information becomes available can an in-
doubt transaction be either committed or aborted at s
i
.
For each transaction in group 2, the coordinator of the transaction is informed
about the abort of the transaction. Every participant of each transaction in groups 1
and 3 is informed about the abort of the transaction. Note that the record for a
subtransaction in group 2 in the reconstructed active-transaction table contains the
identifiers of the distributed transaction and its coordinator and that the record for a