Information Technology Reference
In-Depth Information
ARDA stands for “A Realisation of Distributed Analysis” (http://cern.
ch/arda) and is jointly funded by EGEE and by CERN and with substan-
tial contributions of several institutes such as the Russian institutes in
LCG and the Taipei Academia Sinica Grid Center. With the word “analy-
sis” in HEP we mean all computing activities, performed, almost inde-
pendently, by individual physicists, sometimes organized in small teams.
In general they share a common software foundation but each individ-
ual/team has a set of different executables, tailored for a specii c scien-
tii c task. All analyses share part of the input data (experimental data,
both raw and reconstructed plus simulation data) but often rely on pri-
vate copies of “derived data.” Frequent multiple passes on subsets of the
data are the rule. The impact of this activity on the grid computing is
relevant at least in three areas.
On the one side, the size of potential user community (in the case of
the LHC experiments, several thousands physicists) is a call for a robust
system, which should be reasonably user friendly and transparent.
Analysis is therefore very different from the organized activities (detec-
tor simulation, raw data reconstruction, etc.) that are performed by a
single expert team in a coordinated way. Realistically if a large commu-
nity has to use the grid, this should not force unnecessary changes in
the way of working (analysis is a day-to-day activity). With grid tech-
nologies being still in a fast-evolution phase the users should be shielded
at least by nonessential changes in the internal components of the
infrastructure.
The second area is again intimately connected to users' expectations.
Users are interested in performing analysis on the grid only if they can
get a faster turnaround time or have access to larger or more complex
datasets. The potential benei t of larger resources could be reduced (or
even disappear) if one needs continuous expert support as in trouble-
shooting activities. This observation translates into the requirement of
a system, which should not only provide sheer power but also be reli-
able and efi cient. In this case the users have to rely on the results back
within dependable time limits. High efi ciency implies no need for too
many time-consuming operations like resubmitting jobs due to fail-
ures of the system in accepting jobs, in accessing the data or in return-
ing the results. Simple access to relevant monitoring information is
clearly the key.
The third area is data access. Data access on the grid is a i eld of research
in itself. In the analysis use case, users should be empowered with simple
but powerful tools to place, locate, and access the data. HEP is quite unique
in the area of data management, as we will see in the following, due to the
requirements coming from aggregated data sizes (over several PB per year
for several years of functioning of the experiment and physics analysis),
the need of replication and broad access (user communities of the order of
several thousands scientists).
Search WWH ::




Custom Search