PARALLEL COMPUTER ARCHITECTURES - Structured Computer Organization

Hardware Reference

In-Depth Information

This leads us to finding lines that are really remote. One solution is to give

each page a home machine in terms of where its directory entry is, but not where

the data are. Then a message can be sent to the home machine to at least locate the

cache line. Other schemes involve organizing memory as a tree and searching up-

ward until the line is found.

The second problem in the list above relates to not purging the last copy. As in

CC-NUMA, a cache line may be at multiple nodes at once. When a cache miss oc-

curs, a line must be fetched, which usually means a line must be thrown out. What

happens if the line chosen happens to be the last copy? In that case, it cannot be

thrown out.

One solution is to go back to the directory and check to see if there are other

copies. If so, the line can be safely thrown out. Otherwise, it has to be migrated

somewhere else. Another solution is to label one copy of each cache line as the

master copy and never throw it out. This solution avoids the need to check with

the directory. All in all, COMA offers promise to provide better performance than

CC-NUMA, but few COMA machines have been built, so more experience is

needed. The first two COMA machines built were the KSR-1 (Burkhardt et al.,

1992) and the Data Diffusion Machine (Hagersten et al., 1992). More recent

papers on COMA are Vu et al. (2008) and Zhang and Jesshope (2008).

As we saw in Fig. 8-23, the two kinds of MIMD parallel processors are multi-

processors and multicomputers. In the previous section we studied multiproces-

sors. We saw that they appear to the operating system as having shared memory

that can be accessed using ordinary LOAD and STORE instructions. This shared

memory can be implemented in many ways as we have seen, including snooping

buses, data crossbars, multistage switching networks, and various directory-based

schemes. Nevertheless, programs written for a multiprocessor can just access any

location in memory without knowing anything about the internal topology or im-

plementation scheme. This illusion is what makes multiprocessors so attractive

and why programmers like this programming model.

On the other hand, multiprocessors also have their limitations, which is why

multicomputers are important, too. First and foremost, multiprocessors do not

scale to large sizes. We saw the enormous amount of hardware Sun had to use to

get the E25K to scale to 72 CPUs. In contrast, we will study a multicomputer

below that has 65,536 CPUs. It will be years before anyone builds a commercial

65,536-node multiprocessor. By then million-node multicomputers will be in use.

In addition, memory contention in a multiprocessor can severely affect per-

formance. If 100 CPUs are all trying to read and write the same variables con-

stantly, contention for the various memories, buses, and directories can cause an

enormous performance hit.

Search WWH ::

Custom Search

Home