Closing the Gap between Cloud Providers and Scientific Users - Cloud Computing with e-Science Applications

Information Technology Reference

In-Depth Information

built using an R script, so that parameter configuration can be automatic

when calling the original Maxent file, focusing on the special requirements

of the Humboldt Institute and including some Java VM fine-tuning. This

version is already configured as an e-Clouds application, accessible by all

users. As shown in Figure 6.6, it depends on the packages dismo, maptools,

sp, and rJava.

Three files are received by this application as parameters; the first is an

input R script, which contains the R commands needed to analyze the data.

The second is a stack file that contains different layers with characteristics

of Colombia, such as temperature, humidity, altitude, and so on in a raw

“.asc” data format. The third file contains the coordinates where a certain

species has been spotted in Colombia, in a defined comma-separated value

format. All the input files needed were previously uploaded to the S3-based

e-Clouds file system under a user account. The outputs of the application dif-

fer based on the configuration, but usually include visual maps that show the

resulting model for a particular species and can be exported to file formats

(e.g., pdf or HTML).

Earlier, to execute the application, clusters were deployed in the university

campus consisting of VMs using two cores of an Intel Core i7 processor and

8 GB of memory. In that execution, the files were stored in a network-attached

storage. Similar jobs had been executed using the same input files used for

the tests in e-Clouds. With these clusters, the average execution time for each

job was 18 minutes.

As previously explained, the execution parameters are based on an initial

time estimation made by the application configurator. The selection of these

parameters affects other parameters, such as the total cost of the execution

and the total time that it takes to finish. A user is capable of including the

user's own estimation, based on the user's knowledge of the application and

the data to be processed. The system recalculates the total costs and time

when the parameters are changed.

Two different approaches were used: The first one seeks to minimize the

total cost of the execution, and the other seeks to minimize the execution

time. Previous estimations of the required time for a particular job execution

to completion were made. The total execution time is calculated by multi-

plying the number of jobs by the expected time per job in minutes. Table 6.1

shows the results of the execution times and costs using different numbers

of species.

The average installation time refers to the time spent on the application

installation process. This process is only carried out once per machine

and execution. The results show that the application install can be done on

demand without significantly affecting the total time. It can also be seen that

the times obtained from the earlier executions under private cluster environ-

ments are similar to the execution in AWS. It is important to note that, using

a storage system like S3, it scales up adequately since the execution time is

not affected by the number of machines.

Cloud Computing with e-Science Applications

Search WWH ::

Custom Search

Home