Database Reference
In-Depth Information
have led to the creation of ever increasing volumes of data on a daily basis. This
data is also diverse, ranging from images and videos (e.g., from mobile phones being
uploaded to websites such as Facebook and YouTube), to 24/7 digital TV broadcasts
and surveillance footages (e.g., from hundreds of thousands of security cameras), to
large scientific experiments (e.g., the Large Hadron Collider [35]), which produces
many terabytes of data every single day. IDC's latest Digital Universe Study predicts
a 300-fold increase in the volume of data globally, from 130 exabytes in 2012 to
30,000 exabytes in 2020 [23].
Organizations are trying to leverage or, in fact, cope with the vast and diverse
volumes of data (or Big Data ) that is seemingly growing ever so fast. For instance,
Google, Yahoo!, and Facebook have gone from processing gigabytes and terabytes
of data to the petabyte range [37]. This puts immense pressure on their computing
and storage infrastructures that need to be available 24/7 and scale seamlessly as the
amount of data produced rises exponentially. Virtualization provides a reliable and
elastic environment, which ensures that computing and storage infrastructures can
effectively tolerate faults, achieve higher availability and scale as needed to handle
large volumes, and varied types of data; especially when the extent of data volumes
are not known a priori. This allows efficient reaction to unanticipated demands of
Big Data analytics, better enforcement of reliable Quality of Service (QoS) and satis-
factory meeting of Service Level Agreements (SLAs). Besides, virtualization facili-
tates the manageability and portability of Big Data platforms by abstracting data
from its underpinnings and removing the dependency from the underlying physical
hardware. Lastly, virtualization provides the foundation that enables many of the
cloud services to be used either as Big Data analytics or as data sources in Big Data
analytics. For instance, Amazon Web Services added a new service, Amazon Elastic
MapReduce (EMR), to enable businesses, researchers, data analysts, and developers
to easily and cost effectively process Big Data [4]. EMR exploits virtualization and
uses Hadoop [28] as an underlying framework hosted on Amazon EC2 and Amazon
Simple Storage Service (Amazon S3) [5]. The state-of-the-art Hadoop MapReduce
can also be used with Amazon S3, exactly as is the case with Hadoop Distributed
File System (HDFS) [24,28].
16.1.6 m iXeD -os e nvironment
As shown in Figure 16.3 and pointed out in Section 16.1.4, a single hardware plat-
form can support multiple OSs simultaneously. This provides great flexibility for
users where they can install their own OSs, libraries, and applications. For instance,
a user can install one OS for office productivity tools and another OS for application
development and testing, all on a single desktop computer or on the cloud (e.g., on
Amazon EC2).
16.1.7 F aCilitating r esearCh
Running an OS on a VM allows the hypervisor to instrument accesses to hard-
ware resources and count specific event types (e.g., page faults) or even log detailed
information about events' natures, events' origins, and how operations are satisfied.
Search WWH ::




Custom Search