Understanding Cloud Characteristics - Deploying and Managing a Cloud Infrastructure

Information Technology Reference

In-Depth Information

Structure Definition It's easier to express structure of both structured and unstructured

data when it's organized into multiple levels. Consider the example of storing your family

photos. You may have a high-level folder named Family Photos with subfolders either for

every member of the family or based on location or event. This hierarchy can theoretically

continue till infinity. Imagine if you were not allowed to create subfolders and just had one

single folder to keep all your data. Accessing that data manually would be a nightmare.

Despite these and many more advantages that have made the hierarchical file systems

so popular throughout the past decades, they fall acutely short when it comes to the scale

of data storage and read/write operations at which the cloud operates. Read/write latency,

lack of APIs, and scalability are among the challenges we would have if we stuck with

legacy hierarchical file systems.

Read/Write Latency

Legacy file systems like NTFS have an elaborate mechanism to access data stored on the

physical medium (magnetic or SSDs). Because the data is available in a hierarchical struc-

ture, reading it would require first decoding the hierarchy and computing the actual physical

address of the memory blocks on the physical storage medium.

Any system that has to access data from a storage device would have to compute the

address of the required memory blocks and go through a similar process, but with legacy

file systems, there are additional hops required because of the hierarchic layout. This addi-

tional processing may be tolerable in individual machine or even small-scale cluster deploy-

ment, but it becomes a significant showstopper when we scale it to the cloud level where

petabytes of data elements are processed every single day, often across data centers that are

physically separated.

Lack of APIs

The cloud is one big application programming interface (API). Almost every resource avail-

able on the cloud can be reserved, provisioned, consumed, and discarded through vendor

APIs. The reason for creating API abstractions over all the cloud infrastructure and service

resources any vendor offers is that the cloud was made for the Web. In contrast, legacy file

systems were not primarily made for the Web. Early systems like FAT16 and 32 were primary

made for single users, with little or no network connectivity considerations kept in focus

while they were developed. This is the reason we had NTFS and then more protocols with

focus on a clustered data storage system, where data would need to be accessed over the net-

work and end users would no longer be able to control the actual physical storage medium

where the data would be stored.

The cloud is a virtualized abstraction over a group of servers interconnected with a high-

speed network. In a virtualized environment, you never know how much physical resources

have been allocated to you. The VM that the cloud provider has spun up on your request

and made available to you is a complete abstraction over the actual compute, storage, and

networking resources available within the vendor's data centers.

The cloud would not have survived if we did not have object-based storage. But then object

based storage is also one of the stumbling blocks that prevents cloud migration for enterprises

Search WWH ::

Custom Search

Home