Understanding Cloud Characteristics - Deploying and Managing a Cloud Infrastructure

Information Technology Reference

In-Depth Information

Legacy file storage systems do not expose the metadata of a file other than what the

operating system reveals. The concept of defining custom metadata for a file is also alien

to legacy file systems.

Data/Blob

In the world of structured data, a blob is considered a lump of raw data that cannot be

operated on. Even though there is structure in every type of data that exists today, the defi-

nition of structure in terms of data is limited to data elements that can be operated on by a

database system. A list of patient names, their addresses, and SSNs would be an example of

structured data, whereas X-ray files, 3D scan files, and such visual data would be classified

as unstructured data.

Any database system would be able to run SQL queries on the structured data and treat

unstructured data as a lump that can be identified by some form of associated structured

data, like an ID or patient name in this case. You cannot “sort” X-ray images based on

their brightness or some other image-related characteristic, but you can always sort a

patient list based on ascending order of SSNs.

An RDBMS is one of the most inefficient ways to store unstructured data. Consider a

typical example of a customer table with the following columns:

serial_no

checkin_date

SSN

treatment_id

Suppose you are building an online electronic medical record (EMR) application and

you have to keep a record of every test and its result, including all imagery/audio data of

the patient within the system, and be able to fetch them on demand. This would mean that

you insert all this “unstructured” data as blobs into the database and hurt the overall query

performance because a single row of maybe a few kilobytes has now bloated to anywhere

from a few megabytes to hundreds of megabytes or even GBs. This would be the most

inefficient but highly convenient way to do this because all your information of any given

patient is centralized into the database. Databases were just not made to embrace unstruc-

tured data; this was always an afterthought or a default option to construct your applica-

tion quickly and forego performance.

Within the cloud, your data is considered a blob and put into the same pool as any other

structured or unstructured data element. The recommended practice, though, is to store

structured data into document-based storage and route unstructured data into Amazon S3,

Microsoft Azure Storage, or Google Cloud Storage.

Extended Metadata

As we discussed previously, the association of metadata with an object, both system-generated

metadata and custom key-value metadata created by the user, helps in associating added

information, especially with unstructured data objects. Extended metadata primarily consists

of a unique identifier for every single object in the pool of the cloud vendor, which helps in

fetching the object regardless of where it's physically located.

Search WWH ::

Custom Search

Home