Large-Scale File Systems and Map-Reduce - Mining of Massive Datasets

Databases Reference

In-Depth Information

2.1

Distributed File Systems

Most computing is done on a single processor, with its main memory, cache, and

local disk (a compute node). In the past, applications that called for parallel

processing, such as large scientific calculations, were done on special-purpose

parallel computers with many processors and specialized hardware. However,

the prevalence of large-scale Web services has caused more and more computing

to be done on installations with thousands of compute nodes operating more

or less independently. In these installations, the compute nodes are commodity

hardware, which greatly reduces the cost compared with special-purpose parallel

machines.

These new computing facilities have given rise to a new generation of pro-

gramming systems. These systems take advantage of the power of parallelism

and at the same time avoid the reliability problems that arise when the comput-

ing hardware consists of thousands of independent components, any of which

could fail at any time. In this section, we discuss both the characteristics of

these computing installations and the specialized file systems that have been

developed to take advantage of them.

2.1.1

Physical Organization of Compute Nodes

The new parallel-computing architecture, sometimes called cluster computing,

is organized as follows. Compute nodes are stored on racks, perhaps 8-64

on a rack. The nodes on a single rack are connected by a network, typically

gigabit Ethernet. There can be many racks of compute nodes, and racks are

connected by another level of network or a switch. The bandwidth of inter-rack

communication is somewhat greater than the intrarack Ethernet, but given the

number of pairs of nodes that might need to communicate between racks, this

bandwidth may be essential. Figure 2.1 suggests the architecture of a large-

scale computing system. However, there may be many more racks and many

more compute nodes per rack.

It is a fact of life that components fail, and the more components, such as

compute nodes and interconnection networks, a system has, the more frequently

something in the system will not be working at any given time. For systems

such as Fig. 2.1, the principal failure modes are the loss of a single node (e.g.,

the disk at that node crashes) and the loss of an entire rack (e.g., the network

connecting its nodes to each other and to the outside world fails).

Some important calculations take minutes or even hours on thousands of

compute nodes. If we had to abort and restart the computation every time

one component failed, then the computation might never complete successfully.

The solution to this problem takes two forms:

1. Files must be stored redundantly. If we did not duplicate the file at several

compute nodes, then if one node failed, all its files would be unavailable

until the node is replaced. If we did not back up the files at all, and the

Mining of Massive Datasets

Search WWH ::

Custom Search

Home