Database Reference
In-Depth Information
instance generates a total of at least 327,726 files with a size of about 302 GB. In
addition, each step of the processing takes hours of time, and hence tens of hours (of
supercomputer power) are needed for the whole instance. On average, a typical pulsar
searching application instance generates more than 100,000 files with the size larger
than 230 GB.
3.1.3 Storing pulsar searching data in the Cloud
The pulsar searching application currently runs on the Swinburne high-performance
supercomputing facility. Because the supercomputer is a shared facility that cannot
offer sufficient storage capacity to hold the accumulated terabytes of data, all the gen-
erated data are deleted after having been used once and only the beam data that are
extracted from the raw telescope data are stored. However, at least some of these data
should ideally be stored for reuse. For example, the de-dispersion files can be reused
to apply different seeking algorithms on finding potential pulsar candidates. Such re-
use of the de-dispersion files could save hours of time spent on regeneration, which
would not only delay the scientists from conducting their experiments, but also incur
a large amount of computation overhead.
The Cloud has offered an excellent storage and computing capacity, which from
the user's perspective, is unlimited for storing all the data generated during the execu-
tion of applications and processing them in a high performance format. This feature
of the Cloud is very desirable, especially by scientific applications with data-intensive
characteristics. By applying the Cloud storage for the pulsar searching data, the stor-
age limitation can be completely eliminated, and much more generated data can be
stored for handy reuse.
If we try to execute the pulsar searching application in the Cloud, the cost for upload-
ing the raw telescope data is the same as before, that is, raw data stored by tapes and
sent to a data center via post. However, a problem has emerged, that is, the cost of hiring
Cloud storage resources for these data could be huge. As we mentioned earlier in this
section, a typical pulsar searching instance generates more than 100,000 files with the
size larger than 230 GB (690 GB of data is essentially stored in the Cloud by using the
conventional three-replica replication strategy). According to the latest Amazon S3 stor-
age prices, storing 230GB of data using S3 standard storage service in a “U.S. standard”
region costs US$12.65 per month (i.e., $0.055 per GB/month). This storage cost seems
to be a small amount. But in order to meet the needs of pulsar searching applications, we
often need to store much more data generated by much longer observation, and several
hundreds of such application instances may need to be conducted. For a series of obser-
vations conducted 8 hours a day for 30 days, the size of generated files could reach 543.6
TB (or 1630.8 TB in the Cloud). According to the latest Amazon S3 storage service
price, storing these files using the standard storage service costs about US$29,900 per
month ($0.055/GB × month), where two-thirds of the money are in fact spent on storing
data redundancy for providing data reliability assurance. Moreover, as the pulsar search-
ing program continues, the number and size of the generated files become bigger and
bigger, hence the cost for storing data redundancy becomes even higher.
 
Search WWH ::




Custom Search