Information Technology Reference
In-Depth Information
built using an R script, so that parameter configuration can be automatic
when calling the original Maxent file, focusing on the special requirements
of the Humboldt Institute and including some Java VM fine-tuning. This
version is already configured as an e-Clouds application, accessible by all
users. As shown in Figure 6.6, it depends on the packages dismo, maptools,
sp, and rJava.
Three files are received by this application as parameters; the first is an
input R script, which contains the R commands needed to analyze the data.
The second is a stack file that contains different layers with characteristics
of Colombia, such as temperature, humidity, altitude, and so on in a raw
“.asc” data format. The third file contains the coordinates where a certain
species has been spotted in Colombia, in a defined comma-separated value
format. All the input files needed were previously uploaded to the S3-based
e-Clouds file system under a user account. The outputs of the application dif-
fer based on the configuration, but usually include visual maps that show the
resulting model for a particular species and can be exported to file formats
(e.g., pdf or HTML).
Earlier, to execute the application, clusters were deployed in the university
campus consisting of VMs using two cores of an Intel Core i7 processor and
8 GB of memory. In that execution, the files were stored in a network-attached
storage. Similar jobs had been executed using the same input files used for
the tests in e-Clouds. With these clusters, the average execution time for each
job was 18 minutes.
As previously explained, the execution parameters are based on an initial
time estimation made by the application configurator. The selection of these
parameters affects other parameters, such as the total cost of the execution
and the total time that it takes to finish. A user is capable of including the
user's own estimation, based on the user's knowledge of the application and
the data to be processed. The system recalculates the total costs and time
when the parameters are changed.
Two different approaches were used: The first one seeks to minimize the
total cost of the execution, and the other seeks to minimize the execution
time. Previous estimations of the required time for a particular job execution
to completion were made. The total execution time is calculated by multi-
plying the number of jobs by the expected time per job in minutes. Table 6.1
shows the results of the execution times and costs using different numbers
of species.
The average installation time refers to the time spent on the application
installation process. This process is only carried out once per machine
and execution. The results show that the application install can be done on
demand without significantly affecting the total time. It can also be seen that
the times obtained from the earlier executions under private cluster environ-
ments are similar to the execution in AWS. It is important to note that, using
a storage system like S3, it scales up adequately since the execution time is
not affected by the number of machines.
Search WWH ::




Custom Search