Cloud-Hosted Data Storage Systems - Cloud Data Management

Database Reference

In-Depth Information

Amazon: S3/SimpleDB/Amazon RDS

Amazon Simple Storage Service (S3) is an online public storage web service offered

by Amazon Web Services. Conceptually, S3 is an infinite store for objects of

variable sizes. An object is simply a byte container which is identified by a URI.

Clients can read and update S3 objects remotely using a simple web services

(SOAP or REST-based) interface. For example, get . uri / returns an object and

put . uri ; bytestream / writes a new version of the object. In principle, S3 can be

considered as an online backup solution or for archiving large objects which are

not frequently updated.

Amazon has not published details on the implementation of S3. However,

Brantner et al. [ 85 ] have presented initial efforts of building Web-based database

applications on top of S3. They described various protocols for storing, reading and

updating objects and indexes using S3. For example, the record manager component

is designed to manages records where each record is composed of a key and payload

data. Both key and payload are bytestreams of arbitrary length where the only

constraint is that the size of the whole record must be smaller than the page size.

Physically, each record is stored in exactly one page which in turn is stored as a

single object in S3. Logically, each record is part of a collection (e.g., a table).

The record manager provides functions to create new objects, read objects, update

objects, and scan collections. The page manager component implements a buffer

pool for S3 pages. It supports reading pages from S3, pinning the pages in the buffer

pool, updating the pages in the buffer pool, and marking the pages as updated. All

these functionalities are implemented in straightforward way just as in any standard

database system. Furthermore, the page manager implements the commit and abort

methods where it is assumed that the write set of a transaction (i.e. the set of updated

and newly created pages) fits into the client's main memory or secondary storage

(flash or disk). If an application commits, all the updates are propagated to S3

and all the affected pages are marked as unmodified in the client's buffer pool.

Moreover, they implemented standard B-tree indexes on top of the page manager

and basic redo log records. On the other hand, there are many database-specific

issues that has not been addressed, yet, by this work. For example, DB-style strict

consistency and transactions mechanisms are not provided. Furthermore, query

processing techniques (e.g., join algorithms and query optimization techniques) and

traditional database functionalities such as: bulkload a database, create indexes and

drop a whole collection still need to be devised.

SimpleDB is another Amazon service which is designed for providing structured

data storage in the cloud and backed by clusters of Amazon-managed database

servers. It is a highly available and flexible non-relational data store that offloads

the work of database administration. Storing data in SimpleDB does not require

any pre-defined schema information. Developers simply store and query data items

via web services requests and Amazon SimpleDB does the rest. There is no rule

that forces every data item (data record) to have the same fields. However, the lack

of schema means also that there are no data types as all data values are treated as

Cloud Data Management

Search WWH ::

Custom Search

Home