Databases Reference
In-Depth Information
[7], recovery management in distributed system is investigated. In [29], roll-
back recovery techniques for long-run applications are thoroughly discussed.
In [30, 31, 32, 33], checkpoint-based rollback recovery is discussed. In [34],
reliability modeling and evaluation criteria are thoroughly discussed. More
recently, (a) David Patterson et al. have proposed the concept of ROC (Re-
covery -Oriented Computing) [35] in which recovery is used as a general tech-
nique for dealing with failure in complex systems. For example, in [36] a model
of “recursive recovery” is proposed in which a complex software system is de-
composed into a multi-layer modular self-recovering implementation. (b) The
Nooks approach [37] makes device driver failures transparent to operating
systems.
Unfortunately, due to the fundamental differences mentioned in Section 1
between failure recovery and attack recovery, existing failure recovery tech-
niques cannot effectively deal with malicious attacks. For example, (a) rolling
back the application's state to a previous corruption-free checkpoint will lose
all the good work done after the checkpoint. (b) Maintaining frequent check-
points [38, 39, 40] may not work since no checkpoint taken between the time
of attack and the time of recovery can be used. (c) Standy replica systems
will not only replicate good work, but also replicate infection!
With DQR in data processing systems as the theme of this paper, this
section will focus on failure recovery technologies for data processing sys-
tems and their limitations in solving the DQR problem. In the following, we
classify failure recovery technologies for data processing systems into three
categories: transactional undo/redo, replication-based recovery, and storage
media backup-restore, and discuss them in three subsections, respectively.
3.1 Transactional Undo/Redo
The crux of transactional undo/redo techniques is correcting the application
states that are corrupted due to failures. For data-processing systems or data-
oriented applications in which doing read and write operations on various data
objects (managed by a set of databases) represents the main activities, failure
recovery is rooted in the transaction concept [41] which has been around for
a long time. This concept encapsulates the ACID (Atomicity, Consistency,
Isolation, and Durability) properties [3, 41]. Data-oriented applications are
not limited to the database area [42, 43, 44, 7, 45, 46]. The basic recovery
procedure is almost the same for all applications: when a failure happens, a
set of undo operations will be performed to rollback the application's state to
the most recent checkpoint , which is maintained through logging, then a set
of redo operations will be performed to restore the state to exactly the failing
point. Nevertheless, the concrete recovery algorithms depend heavily upon
how changes are logged. WAL (Write Ahead Logging) is today the standard
approach widely accepted by the database industry. Some of the commercial
systems and prototypes based on WAL are ARIES [26], IBM's AS/400 [47],
Search WWH ::




Custom Search