High Throughput Data Movement - Scientific Data Management

Database Reference

In-Depth Information

TABLE 5.1 Comparison of GTC run times on the ORNL Cray XT3

development machine for two input sizes using different data output

mechanisms

Run

Time for 100 iterations

Parameters

(582,410 ions)

(1,164,820 ions)

No Output

213

422

Lustre

232

461

DataTap

220

435

where the data is partitioned into different bounding boxes. Once the data

is received by the DataTap server, we filter the data based on the bounding

box and then transfer the data for visualization. Copies of both the whole

data and the multiple small partitioned datasets are then forwarded on to the

storage nodes. Since GTC has the potential of generating PBs of data, we

find it necessary to filter/reduce the total amount of data. The time taken to

perform the bounding box computation is 2.29s and the time to transfer the

filtered data is 0.037s. In the second implementation we transfer the data first

and run the bounding box filter after the data transfer. The time taken for

the bounding box filter is the same (2.29s) but the time taken to transfer the

data increases to 0.297s. The key is not the particular values for the two cases

but rather the relationship between them, which shows the relative advantages

and disadvantages. In the first implementation the total time taken to transfer

the data and run the bounding box filter is lower, but the computation is

performed on the DataTap server. This increases the server's request service

latency. For the second implementation, the computation is performed on a

remote node and the impact on the DataTap is reduced. The value of this

approach is that it allows an end user to compose a utility function that takes

into account the cost in time at a particular location . Since most centers

charge only for time on the big machines, oftentimes the maximum utility will

show that filtering should be done on the remote nodes. If the transmission

time to the remote site was to increase and slow down the computation more

than the filtering time, higher utility would come from filtering the data before

moving it. Thus, it is important that the I/O system be flexible enough to

allow the user to switch between these two cases.

5.2.3 High-Speed Asynchronous Data Extraction Using

DART

As motivated previously, scientific applications require a scalable and ro-

bust substrate for managing the large amounts of data generated and for

asynchronously extracting and transporting them between interacting compo-

nents. DART (decoupled and asynchronous remote transfers) 10 is an alternate

Scientific Data Management

Search WWH ::

Custom Search

Home