Clustering (Networking)

Clustering refers to the interconnection of servers in a way that makes them appear to the operating environment as a single system. As such, the cluster draws on the power of all the servers to handle the demanding processing requirements of a broad range of technical applications. It also takes advantage of parallel processing in program execution. Shared resources in a cluster may include physical hardware devices such as disk drives and network cards, TCP/IP addresses, entire applications, and databases. The cluster service is a collection of software on each node that manages all cluster-specific activity, including traffic flow and load balancing. The nodes are linked together by standard Ethernet, FDDI, ATM, or Fibre Channel connections.

Advantages

There are many advantages to clustering:

PERFORMANCE Throughput and response time are improved by using a group of machines at the same time.

AVAILABILITY If one node fails, the workload is redistributed among the other nodes for uninterrupted operation.

INCREMENTAL GROWTH Performance and availability can be enhanced by adding more nodes to the cluster.

SCALING Theoretically, there is no limit on the number of machines that can belong to the cluster.

PRICE AND PERFORMANCE The individual nodes of a cluster typically offer very good performance for their price. Because clustering does not involve the addition of expensive high-performance processors, buses, or cooling systems, the cluster retains the price/performance advantage of its individual members.

Comparison with SMP

Another form of parallel computing is the symmetric multiprocessor (SMP), which has been around since the early 1970s. An SMP computer has multiple processors, each with the same capabilities. The computer’s operating system distributes the processing tasks among two or more processors. Each processor can run the operating system as well as user applications. Not only can any processor execute any job, but the jobs can be shifted from one processor to another as the load changes.

Traditionally, SMPs and clusters have been considered competitive technologies, even though they can coexist. Nevertheless, there are some differences. For example, each machine in a cluster has its own local memory, and communication with other machines is less efficient than access to a machine’s own memory. In addition, each machine in a cluster has its own attached I/O. Access to another machine’s I/O is less efficient than access to a machine’s own I/O. By contrast, an SMP does not have multiple I/O systems or memories, and each processor has equal access to every location in the I/O system and memory.

Despite all this, there are some performance advantages to clustering. As the number of processors increases, SMP designs require that more memory and I/O bandwidth be added as well, which increases the cost. Since clusters remain intrinsically balanced in their memory and I/O capabili-ties—because each node comes with its own memory and I/O subsystems—performance increases are a natural result of adding more nodes.

At one time, SMPs had advantages over clusters in the areas of software and usability—specifically, central system administration, load balancing, and middleware support. With new server operating system enhancements and extensions, however, clusters offer the same capabilities and features.

Cluster Software

In comprising multiple nodes, clusters require special software. For example, there are products for batch job submission. These products typically perform at least rudimentary load balancing as well, and allow users to query the status of jobs. The degree to which the multiple-machine nature of the cluster is hidden varies among products. Some also provide checkpoint/restart facilities for more reliable operation. Others provide cluster-wide administration and accounting services.

Every major vendor of database management systems has a version of its product that operates across multiple computers. Such systems effectively merge the machines into a single database entity for application and database administration purposes.

There are also software packages available that enable applications to be reprogrammed to run in parallel. In being reorganized or “optimized” to execute simultaneously on multiple nodes in a cluster, the application is able to run faster.

Another kind of software is the operating system. Special extensions to existing operating systems such as Microsoft Windows NT enable multiple nodes running the operating system to be managed as if they were one machine, and they appear to client machines as a single system. Clients connect to the cluster, not to any single machine. The computational load is automatically balanced across the machines, and should one (or more) of the machines in the cluster fail, client systems will never know since the other nodes will automatically pick up the load without clients having to reconnect or take any other action.

UNIX operating systems provide similar capabilities and go beyond them in being able to detect software failures in addition to hardware failures. This capability is especially important, since software failures far outnumber hardware failures. The degree of software failure coverage varies among individual products, but all are capable of detecting operating system failures within the cluster.

Last Word

Traditionally, clusters were used in very high-end UNIX and proprietary server environments to improve application uptime and increase overall processing capacity. Today the rapid growth of applications such as Web serving, electronic commerce and enterprise resource planning (ERP) are beginning to push high-availability, high-performance clustering technologies into the commercial mainstream. This migration has been facilitated by the increasing performance of microprocessors and decreasing prices, which makes clusters a viable way to meet the needs of server systems in a cost-effective way.

Next post:

Previous post: