Publication and Protection of Sensitive Site Information in a Grid Infrastructure - Cloud, Grid and High Performance Computing: Emerging Applications

Information Technology Reference

In-Depth Information

support for distributed logging and richer content

filtering options.

Currently OSG resources optionally log all

information related to Grid processes using

syslog-ng, and send this to a central collector

managed by the GOC. The primary uses for this

information are:

and work in conjunction with the troubleshooting

team and the user to diagnose the specific problem.

Security incident information is perhaps even

more sensitive, and syslog information revealing

incident details must have tight access controls.

Once again, this points to restricting the informa-

tion to an authorized set of security personnel.

Syslog-ng allows for collectors on a per site

basis (Tierney, Gunter & Schopf, 2007), which

can then filter out the information getting passed

to the OSG wide collector. This would allow

sites to collect detailed information internally,

while filtering the information sent to the OSG.

Any information sent to the OSG GOC should be

encrypted. As long as there is enough information

being sent to identify a failure or compromise at

a central level, the relevant sites can be notified

of this. The sites can then address the specifics of

the problem, and provide more information to the

OSG GOC and security team, as necessary. This

is the model that is expected to go into production

for future OSG deployments.

1. Troubleshooting - Being able to trace the

workflow of a distributed job is very useful

as a debugging tool for failures. It makes it

significantly easier to detect how and why

a job might be failing, especially when

multiple sites are involved. The OSG GOC

has a troubleshooting team to deal with such

cases.

2. Security Incident Response - Having cen-

tralized logs available to the OSG security

team, makes it very useful to be able to

analyze the scope and extent of a security

compromise. It allows the GOC to identify

compromised sites or users, and to judge the

nature of the compromise. Affected sites can

then be notified for rapid incident response.

Site Availability and Validation Data

In the troubleshooting case, there is the need

to protect failure modes from becoming publicly

available, as this could reveal possible avenues for

attack. For example, a poorly configured site may

have vulnerabilities in the execution path. While

not apparent through the standard client software,

these may be exposed through syslog informa-

tion. In general, logging information should only

be available to authorized personnel within the

OSG administrative domain, or to specific users

when debugging problems. Another approach to

this issue involves the level of logging performed

by the site, so that only a minimal amount of in-

formation is logged by default. This translates to

logging only the start and stop times for jobs and

data transfers for a given user. In the event of a

failure, the site can increase the level of logging,

The OSG GOC performs site availability and

validity tests on participating compute and storage

elements, and publishes these results online. These

tests are run at regular intervals, either using a Perl

script (site_verify.pl) or using a customizable set

of probes called RSV (Resource and Service Vali-

dation) (“OSG Resource and Service Validation

Project,”). The basic aim is to validate the services

being advertised through the resource selection

and monitoring modules (CEMon). Much of the

information being collected here is analogous to

CEMon information, and subject to the same is-

sues. The RSV probes use a push model, similar

to the Gratia service. The site_verify.pl script takes

the form of a remote grid job run by the GOC at

individual sites, relaying information back using

the standard Globus data movement protocols

Cloud, Grid and High Performance Computing: Emerging Applications

Search WWH ::

Custom Search

Home