Closing the Gap between Cloud Providers and Scientific Users - Cloud Computing with e-Science Applications

Information Technology Reference

In-Depth Information

As part of the model, directories are logical entities that provide file organi-

zation. Each directory can have files or subdirectories (children) associated.

Files are stored in S3 inside defined buckets and folders. Each file object is

aware of its physical location in S3 and also knows its parent directory. This

way, the web application is capable of handling an organized file structure,

and it is also possible to manage a first level of authorization by associating

a file to a single user (its owner). In addition to the data model, a file browser

set of views was necessary to facilitate user interaction.

6.5.2.1.2 Transactional Data

As part of the e-Clouds solution, there is a central database management sys-

tem for storing data related to transactions. This includes but is not limited to

basic data such as user profile, security associations, resources usage, S3 files

metadata, applications, and of course, user executions. Besides application

and execution information, the database contains what is expected to be in a

standard web application. Database connections can only be established by

the web portal and the RM to enhance security and make the administration

(updates, tests, etc.) easier.

6.5.2.1.3 Local Storage

The main purpose of local storage is to store execution-related data in each

cluster machine. It is primarily used as low-latency (and -cost) storage for

installation files, libraries, input files, and execution results. All information

that resides in local storage is considered ephemeral, so every time an execu-

tion finishes, output files and logs should be uploaded to S3 and indexed in

the transactional data. Everything else that is on local storage will be erased

once a machine shuts down.

6.5.2.2 Queue Messaging

Reliable message queues are the main communication channel between

the different components that make up e-Clouds. At this first version, AWS

Simple Queue Service (SQS) is used. Figure 6.3 shows how the information

flows between the queues and the corresponding communicating entities.

It is important to note that there are two main, always on, queues: presched-

uling and scheduling queues. Also, there is one additional queue for each

user execution, and it is used mainly for job assignment. It is created when

execution is launched and destroyed when it finishes.

The prescheduling queue communicates messages that come from the web

portal, to and from the RM. The scheduling queue has the initial messages

that go from the RM to all the machines in a cluster and receives state updates

from these same machines. At last, execution-specific queues are used to

assign pending jobs to the associated machines.

Search WWH ::

Custom Search

Home