Database Reference
In-Depth Information
Chapter 8. Cloud Computing and
Virtualization
Most Hadoop clusters today run on “real iron”—that is, on small, Intel-based computers run-
ning some variant of the Linux operating system with directly attached storage. However,
you might want to try this in a cloud or virtual environment. While virtualization usually
comes with some degree of performance degradation, you may find it minimal for your task
set or that it's a worthwhile trade-off for the benefits of cloud computing; these benefits in-
clude low up-front costs and the ability to scale up (and down sometimes) as your dataset
and analytic needs change.
By cloud computing, we'll follow guidelines established by the National Institute of Standar-
ds and Technology (NIST), whose definition of cloud computing you'll find here . A Hadoop
cluster in the cloud will have:
▪ On-demand self-service
▪ Network access
▪ Resource sharing
▪ Rapid elasticity
▪ Measured resource service
While these resource need not exist virtually, in practice, they usually do.
Virtualization means creating virtual, as opposed to real, computing entities. Frequently, the
virtualized object is an operating system on which software or applications are overlaid, but
storage and networks can also be virtualized. Lest you think that virtualization is a relatively
new computing technology, in 1972 IBM released VM/370, in which the 370 mainframe
could be divided into many small, single-user virtual machines. Currently, Amazon Web Ser-
vices is likely the most well-known cloud-computing facility. For a brief explanation of vir-
tualization, look here on Wikipedia .
The official Hadoop perspective on cloud computing and virtualization is explained on this
Wikipedia page . One guiding principle of Hadoop is that data analytics should be run on
Search WWH ::




Custom Search