Damage Quarantine and Recovery in Data Processing Systems - Database Security: Applications and Trends

Databases Reference

In-Depth Information

a “good” chance to suffer from a big “hit” from attacks. Due to data sharing,

interdependencies, and interoperability between business processes and appli-

cations, the hit could greatly “amplify” its damage by causing catastrophic

cascading effects, which may “force” an application to shut down itself for

hours or even days before the application is recovered from the hit. (Note

that high speed Internet, e-commerce, and global economy have greatly in-

creased the speed and scale of damage spreading.) The cascading damage and

loss of business continuity (i.e., DoS) may yield too much risk. Because not all

intrusions can be prevented, DQR is an indispensable part of the correspond-

ing security solution, and a quality DQR scheme may generate significant

impact on risk management, business continuity, and assurance.

Secondly, due to several fundamental differences between failure recovery

and attack recovery, the DQR problem cannot be solved by failure recovery

technologies which are very mature in handling random failures. (a) Failure

recovery in general assumes the semantics of fail-stop , while attack recovery

in general cannot assume the semantics of attack-stop, since to achieve the

adversary's goal, most attacks (except for DoS) do not allow themselves to

simply crash the system; they prefer hidden damage and alive zombies, spy-

ware, bots, etc. Assuming fail-stop, quarantine is not really a problem for

failure recovery; however, intrusion/damage quarantine is a challenging re-

search topic in attack recovery and it can make a big difference. (b) Failure

recovery assumes that all operations (e.g., transactions) have equal rights to

be recovered, while attack recovery can never assume “equal rights” because

neither malicious operations nor corrupted operations should be recovered.

Towards understanding and solving the DQR problem, the rest of the ar-

ticle is organized as follows. In Section 2, we present a comprehensive yet

tangible description of the DQR problem. In Section 3, we do in-depth dis-

cussions on the limitations of traditional fault tolerance and failure recovery

techniques in solving the DQR problem. In Section 4, we present a systematic

review on how the DQR problem is being solved. In Section 5, we propose a set

of remaining research issues in fully solving the DQR problem and conclude

the paper.

2 Overview of the DQR Problem

We are concerned with the DQR needs of mission/life/business-critical infor-

mation systems. Since those information systems have been designed, imple-

mented, deployed, and upgraded over several decades, they run both con-

ventional applications, which typically use proprietary user interfaces and

application-level client-server protocols [1], and modern applications, which

are typically web-bounded running standard Web Services protocols.

Nevertheless, both conventional and modern mission/life/business-critical

applications share some common characteristics: they are typically part of a

large-scale, semantically rich, networked, interoperable information system;

Search WWH ::

Custom Search

Home