Information Technology Reference
In-Depth Information
As part of the model, directories are logical entities that provide file organi-
zation. Each directory can have files or subdirectories (children) associated.
Files are stored in S3 inside defined buckets and folders. Each file object is
aware of its physical location in S3 and also knows its parent directory. This
way, the web application is capable of handling an organized file structure,
and it is also possible to manage a first level of authorization by associating
a file to a single user (its owner). In addition to the data model, a file browser
set of views was necessary to facilitate user interaction.
6.5.2.1.2 Transactional Data
As part of the e-Clouds solution, there is a central database management sys-
tem for storing data related to transactions. This includes but is not limited to
basic data such as user profile, security associations, resources usage, S3 files
metadata, applications, and of course, user executions. Besides application
and execution information, the database contains what is expected to be in a
standard web application. Database connections can only be established by
the web portal and the RM to enhance security and make the administration
(updates, tests, etc.) easier.
6.5.2.1.3 Local Storage
The main purpose of local storage is to store execution-related data in each
cluster machine. It is primarily used as low-latency (and -cost) storage for
installation files, libraries, input files, and execution results. All information
that resides in local storage is considered ephemeral, so every time an execu-
tion finishes, output files and logs should be uploaded to S3 and indexed in
the transactional data. Everything else that is on local storage will be erased
once a machine shuts down.
6.5.2.2 Queue Messaging
Reliable message queues are the main communication channel between
the different components that make up e-Clouds. At this first version, AWS
Simple Queue Service (SQS) is used. Figure 6.3 shows how the information
flows between the queues and the corresponding communicating entities.
It is important to note that there are two main, always on, queues: presched-
uling and scheduling queues. Also, there is one additional queue for each
user execution, and it is used mainly for job assignment. It is created when
execution is launched and destroyed when it finishes.
The prescheduling queue communicates messages that come from the web
portal, to and from the RM. The scheduling queue has the initial messages
that go from the RM to all the machines in a cluster and receives state updates
from these same machines. At last, execution-specific queues are used to
assign pending jobs to the associated machines.
Search WWH ::




Custom Search