Databases Reference
In-Depth Information
[7], recovery management in distributed system is investigated. In [29], roll-
back recovery techniques for long-run applications are thoroughly discussed.
In [30, 31, 32, 33], checkpoint-based rollback recovery is discussed. In [34],
reliability modeling and evaluation criteria are thoroughly discussed. More
recently, (a) David Patterson et al. have proposed the concept of ROC (Re-
covery -Oriented Computing) [35] in which recovery is used as a general tech-
nique for dealing with failure in complex systems. For example, in [36] a model
of “recursive recovery” is proposed in which a complex software system is de-
composed into a multi-layer modular self-recovering implementation. (b) The
Nooks approach [37] makes device driver failures transparent to operating
systems.
Unfortunately, due to the fundamental differences mentioned in Section 1
between failure recovery and attack recovery, existing failure recovery tech-
niques cannot effectively deal with malicious attacks. For example, (a) rolling
back the application's state to a previous corruption-free
checkpoint
will lose
all
the good work done after the checkpoint. (b) Maintaining frequent check-
points [38, 39, 40] may not work since no checkpoint taken between the time
of attack and the time of recovery can be used. (c) Standy replica systems
will not only replicate good work, but also replicate infection!
With DQR in
data processing systems
as the theme of this paper, this
section will focus on failure recovery technologies for data processing sys-
tems and their limitations in solving the DQR problem. In the following, we
classify failure recovery technologies for data processing systems into three
categories: transactional undo/redo, replication-based recovery, and storage
media backup-restore, and discuss them in three subsections, respectively.
3.1 Transactional Undo/Redo
The crux of transactional undo/redo techniques is correcting the application
states that are corrupted due to failures. For data-processing systems or data-
oriented applications in which doing read and write operations on various data
objects (managed by a set of databases) represents the main activities, failure
recovery is rooted in the
transaction concept
[41] which has been around for
a long time. This concept encapsulates the
ACID
(Atomicity, Consistency,
Isolation, and Durability) properties [3, 41]. Data-oriented applications are
not limited to the database area [42, 43, 44, 7, 45, 46]. The basic recovery
procedure is almost the same for all applications: when a failure happens, a
set of
undo
operations will be performed to rollback the application's
state
to
the most recent
checkpoint
, which is maintained through logging, then a set
of
redo
operations will be performed to restore the state to exactly the failing
point. Nevertheless, the concrete recovery algorithms depend heavily upon
how changes are logged. WAL (Write Ahead Logging) is today the standard
approach widely accepted by the database industry. Some of the commercial
systems and prototypes based on WAL are ARIES [26], IBM's AS/400 [47],