Coordination of Access to Large-Scale Datasets in Distributed Environments - Scientific Data Management - page 132

Database Reference

In-Depth Information

Whole File Transfer

Remote Input and Output

Appl

Central

storage

Central

storage

Appl

Appl

Appl

Bulk

transfer

Appl

Small remote

I/O Ops

Appl

Small local

I/O Ops

Figure 4.4

Whole file transfer versus remote input and output.

Remote I/O exploits selectivity. When users explore very large

datasets interactively, they may not need the entire contents of the

repository. When using remote I/O, only the data that is actually needed

for the current application is retrieved.

Remote I/O minimizes use of local storage. The storage avail-

able where programs actually run—on user's desktops and in computing

clusters—is not likely to have the capacity or performance of an insti-

tutional data storage system. By employing remote I/O, the local disk

is removed from the system, improving performance, and increasing the

possible execution sites for a given program.

Remote I/O minimizes initial response time. When using file

transfer for a large workload, no processing begins until an entire input

file is transferred, and no output becomes available until some output

file transfers complete. For long-running jobs, this can be a problem,

because the user cannot even verify that the program is running cor-

rectly until the workload completes. When using remote I/O, outputs

become immediately available to the end user for verification or further

processing.

On the other hand, a remote I/O system may result in a large number

of network round trips in order to service each small I/O request. If the

remote I/O is performed over a high-latency wide area network, the result

may be very low CPU utilization because the CPU is constantly waiting for a

network operation to complete. In high-latency networks, it is more practical

to perform whole file transfers. In practice, remote I/O is best used when the

desired application is interactive, selectively uses data, or the execution nodes

are storage constrained.

Next Page

Scientific Data Management

Search WWH ::

Custom Search

Home