Biomedical Engineering Reference
In-Depth Information
or the entire grid. Each interactive chart provides the ability to display detailed
job information.
2. Job History . The ACDC-Grid monitoring system provides detailed historical
job information including CPU consumption rates and job production rates for
either an individual user or a group over a subset of grid resources. To date,
1,600,000 jobs have run on Grid3 since October 2003. Summary charts are
compiled from usage data based on user jobs or VOs for a given range of dates
over a given set of resources. Statistics such as total jobs, average runtime, total
CPU time consumed, and so forth are dynamically produced from the available
database. Each interactive chart allows for detailed information to be displayed.
3. ACDC Site Status . The ACDC-Grid monitoring system generates dynamic
ACDC site status logs, reporting successful monitoring events, and specific
Grid3 site errors corresponding to monitoring event failures.
24.5.2 Scheduling
The ACDC-Grid predictive scheduler uses a database of historical jobs to profile the
usage of a given resource on a user, group, or account basis [46-54]. Determining
accurate quality of service estimates for grid-enabled applications can be defined
in terms of a combination of historical and runtime user parameters, in addition to
specific resource information. Such a methodology is incorporated into the ACDC-
Grid Portal, which continually refines the predictive scheduler parameters based,
partly on the data stored by the monitoring system.
Workload also plays a significant role in determining resource utilization. The
native queue schedulers typically use the designated job wall-time for managing
resource backfill (i.e., small pockets of unutilized resources that are being held for
a scheduled job). However, such systems may also use a weighted combination of
node, process, andwall-time to determine a base priority for each job and subsequently
modify this priority to impose a fair share resource policy based on historical usage.
The backfill system will allow a job with lower priority to overtake a job with higher
priority if it does not delay the start of the prioritized job. The ACDC-Grid predictive
scheduler uses historical information to better profile grid users and more accurately
determine execution times. Our prototype predictive scheduling system is based on
statistical principles [55] that allow jobs to more effectively run in a backfill mode.
We consider the aforementioned shared- and distributed-memory computational
resources at SUNY-Buffalo's CCR. The ACDC-Grid Portal executes many grid-
enabled scientific applications on several of the center's heterogeneous resources con-
currently. Several applications have inter-dependent execution and data requirements,
which require reliable knowledge of job start and completion times.
An explanation of the development of the ACDC-Grid predictive scheduler is best
served by considering a snapshot of the queue for a single computational resource.
Table 24.1 shows 15 running and queued jobs on this resource (Dell P4 cluster with
Myrinet) from six users, which initially completely occupy all processors on all nodes
(i.e., all 516 processors on the 258 dual-processor nodes). There are seven running
Search WWH ::




Custom Search