Trusted Data Management for Grid-Based Medical Applications - Cloud, Grid and High Performance Computing: Emerging Applications

Information Technology Reference

In-Depth Information

facto standard for user and host authentication

on the Grid. GSI is used by most mature Grid

middleware implementations. Shortcomings of

this infrastructure are described later in this paper;

here we introduce the basic GSI infrastructure.

GSI essentially comprises a Public Key Infra-

structure (PKI) that is used to sign user identity and

host certificates. Users can create limited-lifetime

Proxy certificates which allow them to send cre-

dentials with their jobs for authentication, without

the risk of compromising the user's private key.

Proxy certificates are used for all transactions by a

job, such as gridFTP transactions. We here assume

that all authorization decisions with regard to data

are based on GSI user authentication by means

of Proxy certificates. Other approaches (such as

role-based or attribute-based authorization, as

proposed in (Alfieri et al., 2004) are possible,

but not required for our framework. Many Grid

infrastructures manage access control to resources

and storage based on virtual organization (VO)

membership information. However, VO-based

authorization is often too course-grained for pro-

tecting medical information: there may be many

users (e.g., researchers) in a VO, which may not

all be equally trusted to access particular data.

Therefore, we assume authorization based on user

identities in this paper.

of a risk assessment when decisions are made on

which sites are trusted to store or access particular

information.

Given legal constraints, trust decisions will and

should be conservative. For example, unencrypted

data, file names, and other sensitive metadata

should only be stored in trusted domains, e.g., in

the hospital. This aspect is even more prevalent

in systems where jobs on remote machines can

access medical data. Current OSs such as Linux

provide little assurance that information stored

on the system cannot be leaked to external parties

(van 't Noordende, Balogh, Hofman, Brazier and

Tanenbaum, 2007).

Even if files are removed after the job exits

(e.g., temporarily created files), the contents could

be readable by administrators or possibly attackers

while the job executes. Furthermore, disks may

contain left-over information from a job's previ-

ous execution, which is readable by an attacker

who gains physical access to a storage device, if

the system is not properly configured (NIST). As

another example, it is possible to encrypt swap

space in a safe way, but this is an option that has to

be explicitly enabled in the OS. For these reasons,

it is important for a data owner to identify critical

aspects of the administration and configuration

of a remote host, before shipping data to (a job

running on) that host.

Another problem is that a data owner cannot

control nor know the trajectory that a job took

before it was scheduled on a host, since this is

implicit and hidden in current Grid middleware.

Therefore, even if the host from which a job ac-

cesses data is trusted by the data owner, there

is a risk that the job was manipulated on some

earlier host.

Current middleware does not provide a way

to securely bind jobs to Proxy certificates: a cer-

tificate or private key bundled with a program

can easily be extracted and coupled to another

program which pretends to be the original program.

In Grids, this issue is exacerbated by the fact that

a job may traverse several middleware processes

PROBLEM ANALYSIS

Grids are, by nature, distributed across multiple

administrative domains, only a few of which

may be trusted by a specific data owner. Grid

middleware, and thus jobs, typically run on an

operating system (OS), such as Linux, that al-

lows administrators to access all information on

the system. A job or data owner does not have

control over the hardware or software that runs on

some remote system. Besides OS and middleware

vulnerabilities, these systems might also not be

well protected against physical attacks, such as

stealing hard disks. Such aspects should be part

Cloud, Grid and High Performance Computing: Emerging Applications

Search WWH ::

Custom Search

Home