Information Technology Reference
In-Depth Information
Structure Definition It's easier to express structure of both structured and unstructured
data when it's organized into multiple levels. Consider the example of storing your family
photos. You may have a high-level folder named Family Photos with subfolders either for
every member of the family or based on location or event. This hierarchy can theoretically
continue till infinity. Imagine if you were not allowed to create subfolders and just had one
single folder to keep all your data. Accessing that data manually would be a nightmare.
Despite these and many more advantages that have made the hierarchical file systems
so popular throughout the past decades, they fall acutely short when it comes to the scale
of data storage and read/write operations at which the cloud operates. Read/write latency,
lack of APIs, and scalability are among the challenges we would have if we stuck with
legacy hierarchical file systems.
Read/Write Latency
Legacy file systems like NTFS have an elaborate mechanism to access data stored on the
physical medium (magnetic or SSDs). Because the data is available in a hierarchical struc-
ture, reading it would require first decoding the hierarchy and computing the actual physical
address of the memory blocks on the physical storage medium.
Any system that has to access data from a storage device would have to compute the
address of the required memory blocks and go through a similar process, but with legacy
file systems, there are additional hops required because of the hierarchic layout. This addi-
tional processing may be tolerable in individual machine or even small-scale cluster deploy-
ment, but it becomes a significant showstopper when we scale it to the cloud level where
petabytes of data elements are processed every single day, often across data centers that are
physically separated.
Lack of APIs
The cloud is one big application programming interface (API). Almost every resource avail-
able on the cloud can be reserved, provisioned, consumed, and discarded through vendor
APIs. The reason for creating API abstractions over all the cloud infrastructure and service
resources any vendor offers is that the cloud was made for the Web. In contrast, legacy file
systems were not primarily made for the Web. Early systems like FAT16 and 32 were primary
made for single users, with little or no network connectivity considerations kept in focus
while they were developed. This is the reason we had NTFS and then more protocols with
focus on a clustered data storage system, where data would need to be accessed over the net-
work and end users would no longer be able to control the actual physical storage medium
where the data would be stored.
The cloud is a virtualized abstraction over a group of servers interconnected with a high-
speed network. In a virtualized environment, you never know how much physical resources
have been allocated to you. The VM that the cloud provider has spun up on your request
and made available to you is a complete abstraction over the actual compute, storage, and
networking resources available within the vendor's data centers.
The cloud would not have survived if we did not have object-based storage. But then object
based storage is also one of the stumbling blocks that prevents cloud migration for enterprises
Search WWH ::




Custom Search