Virtualizing Resources for the Cloud - Large Scale and Big Data: Processing and Management

Database Reference

In-Depth Information

have led to the creation of ever increasing volumes of data on a daily basis. This

data is also diverse, ranging from images and videos (e.g., from mobile phones being

uploaded to websites such as Facebook and YouTube), to 24/7 digital TV broadcasts

and surveillance footages (e.g., from hundreds of thousands of security cameras), to

large scientific experiments (e.g., the Large Hadron Collider [35]), which produces

many terabytes of data every single day. IDC's latest Digital Universe Study predicts

a 300-fold increase in the volume of data globally, from 130 exabytes in 2012 to

30,000 exabytes in 2020 [23].

Organizations are trying to leverage or, in fact, cope with the vast and diverse

volumes of data (or Big Data ) that is seemingly growing ever so fast. For instance,

Google, Yahoo!, and Facebook have gone from processing gigabytes and terabytes

of data to the petabyte range [37]. This puts immense pressure on their computing

and storage infrastructures that need to be available 24/7 and scale seamlessly as the

amount of data produced rises exponentially. Virtualization provides a reliable and

elastic environment, which ensures that computing and storage infrastructures can

effectively tolerate faults, achieve higher availability and scale as needed to handle

large volumes, and varied types of data; especially when the extent of data volumes

are not known a priori. This allows efficient reaction to unanticipated demands of

Big Data analytics, better enforcement of reliable Quality of Service (QoS) and satis-

factory meeting of Service Level Agreements (SLAs). Besides, virtualization facili-

tates the manageability and portability of Big Data platforms by abstracting data

from its underpinnings and removing the dependency from the underlying physical

hardware. Lastly, virtualization provides the foundation that enables many of the

cloud services to be used either as Big Data analytics or as data sources in Big Data

analytics. For instance, Amazon Web Services added a new service, Amazon Elastic

MapReduce (EMR), to enable businesses, researchers, data analysts, and developers

to easily and cost effectively process Big Data [4]. EMR exploits virtualization and

uses Hadoop [28] as an underlying framework hosted on Amazon EC2 and Amazon

Simple Storage Service (Amazon S3) [5]. The state-of-the-art Hadoop MapReduce

can also be used with Amazon S3, exactly as is the case with Hadoop Distributed

File System (HDFS) [24,28].

16.1.6 m iXeD -os e nvironment

As shown in Figure 16.3 and pointed out in Section 16.1.4, a single hardware plat-

form can support multiple OSs simultaneously. This provides great flexibility for

users where they can install their own OSs, libraries, and applications. For instance,

a user can install one OS for office productivity tools and another OS for application

development and testing, all on a single desktop computer or on the cloud (e.g., on

Amazon EC2).

16.1.7 F aCilitating r esearCh

Running an OS on a VM allows the hypervisor to instrument accesses to hard-

ware resources and count specific event types (e.g., page faults) or even log detailed

information about events' natures, events' origins, and how operations are satisfied.

Search WWH ::

Custom Search

Home