Information Technology Reference
In-Depth Information
support for distributed logging and richer content
filtering options.
Currently OSG resources optionally log all
information related to Grid processes using
syslog-ng, and send this to a central collector
managed by the GOC. The primary uses for this
information are:
and work in conjunction with the troubleshooting
team and the user to diagnose the specific problem.
Security incident information is perhaps even
more sensitive, and syslog information revealing
incident details must have tight access controls.
Once again, this points to restricting the informa-
tion to an authorized set of security personnel.
Syslog-ng allows for collectors on a per site
basis (Tierney, Gunter & Schopf, 2007), which
can then filter out the information getting passed
to the OSG wide collector. This would allow
sites to collect detailed information internally,
while filtering the information sent to the OSG.
Any information sent to the OSG GOC should be
encrypted. As long as there is enough information
being sent to identify a failure or compromise at
a central level, the relevant sites can be notified
of this. The sites can then address the specifics of
the problem, and provide more information to the
OSG GOC and security team, as necessary. This
is the model that is expected to go into production
for future OSG deployments.
1. Troubleshooting - Being able to trace the
workflow of a distributed job is very useful
as a debugging tool for failures. It makes it
significantly easier to detect how and why
a job might be failing, especially when
multiple sites are involved. The OSG GOC
has a troubleshooting team to deal with such
cases.
2. Security Incident Response - Having cen-
tralized logs available to the OSG security
team, makes it very useful to be able to
analyze the scope and extent of a security
compromise. It allows the GOC to identify
compromised sites or users, and to judge the
nature of the compromise. Affected sites can
then be notified for rapid incident response.
Site Availability and Validation Data
In the troubleshooting case, there is the need
to protect failure modes from becoming publicly
available, as this could reveal possible avenues for
attack. For example, a poorly configured site may
have vulnerabilities in the execution path. While
not apparent through the standard client software,
these may be exposed through syslog informa-
tion. In general, logging information should only
be available to authorized personnel within the
OSG administrative domain, or to specific users
when debugging problems. Another approach to
this issue involves the level of logging performed
by the site, so that only a minimal amount of in-
formation is logged by default. This translates to
logging only the start and stop times for jobs and
data transfers for a given user. In the event of a
failure, the site can increase the level of logging,
The OSG GOC performs site availability and
validity tests on participating compute and storage
elements, and publishes these results online. These
tests are run at regular intervals, either using a Perl
script (site_verify.pl) or using a customizable set
of probes called RSV (Resource and Service Vali-
dation) (“OSG Resource and Service Validation
Project,”). The basic aim is to validate the services
being advertised through the resource selection
and monitoring modules (CEMon). Much of the
information being collected here is analogous to
CEMon information, and subject to the same is-
sues. The RSV probes use a push model, similar
to the Gratia service. The site_verify.pl script takes
the form of a remote grid job run by the GOC at
individual sites, relaying information back using
the standard Globus data movement protocols
Search WWH ::




Custom Search