Cloud Computing and Virtualization - Field Guide to Hadoop

Database Reference

In-Depth Information

nodes in the cluster close to the data. Why? Transporting blocks of data in a cluster dimin-

ishes performance. Because blocks of HDFS files are normally stored three times, it's likely

that MapReduce can chose nodes to run your jobs on datanodes on which the data is stored.

In a naive virtual environment, the physical location of the data is not known, and in fact, the

real physical storage may be someplace that is not on any node in the cluster at all.

While it's admittedly from a VMware perspective, good background reading on virtualizing

Hadoop can be found here .

In this chapter, you'll read about some of the open source software that facilitates cloud com-

puting and virtualization. There are also proprietary solutions, but they're not covered in this

edition of the Field Guide to Hadoop .

Serengeti

License

Apache License, Version 2.0

Activity

Medium

Purpose

Hadoop Virtualization

Official Page

Hadoop Integration No Integration

If your organization uses VMware's vSphere as the basis of the virtualization strategy, then

Serengeti provides you with a method of quickly building Hadoop clusters in your environ-

ment. Admittedly, vSphere is a proprietary environment, but the code to run Hadoop in this

environment is open source. Though Serengeti is not affiliated with the Apache Software

Search WWH ::

Custom Search

Home