Hardware Reference
In-Depth Information
This leads us to finding lines that are really remote. One solution is to give
each page a home machine in terms of where its directory entry is, but not where
the data are. Then a message can be sent to the home machine to at least locate the
cache line. Other schemes involve organizing memory as a tree and searching up-
ward until the line is found.
The second problem in the list above relates to not purging the last copy. As in
CC-NUMA, a cache line may be at multiple nodes at once. When a cache miss oc-
curs, a line must be fetched, which usually means a line must be thrown out. What
happens if the line chosen happens to be the last copy? In that case, it cannot be
thrown out.
One solution is to go back to the directory and check to see if there are other
copies. If so, the line can be safely thrown out. Otherwise, it has to be migrated
somewhere else. Another solution is to label one copy of each cache line as the
master copy and never throw it out. This solution avoids the need to check with
the directory. All in all, COMA offers promise to provide better performance than
CC-NUMA, but few COMA machines have been built, so more experience is
needed. The first two COMA machines built were the KSR-1 (Burkhardt et al.,
1992) and the Data Diffusion Machine (Hagersten et al., 1992). More recent
papers on COMA are Vu et al. (2008) and Zhang and Jesshope (2008).
8.4 MESSAGE-PASSING MULTICOMPUTERS
As we saw in Fig. 8-23, the two kinds of MIMD parallel processors are multi-
processors and multicomputers. In the previous section we studied multiproces-
sors. We saw that they appear to the operating system as having shared memory
that can be accessed using ordinary LOAD and STORE instructions. This shared
memory can be implemented in many ways as we have seen, including snooping
buses, data crossbars, multistage switching networks, and various directory-based
schemes. Nevertheless, programs written for a multiprocessor can just access any
location in memory without knowing anything about the internal topology or im-
plementation scheme. This illusion is what makes multiprocessors so attractive
and why programmers like this programming model.
On the other hand, multiprocessors also have their limitations, which is why
multicomputers are important, too. First and foremost, multiprocessors do not
scale to large sizes. We saw the enormous amount of hardware Sun had to use to
get the E25K to scale to 72 CPUs. In contrast, we will study a multicomputer
below that has 65,536 CPUs. It will be years before anyone builds a commercial
65,536-node multiprocessor. By then million-node multicomputers will be in use.
In addition, memory contention in a multiprocessor can severely affect per-
formance. If 100 CPUs are all trying to read and write the same variables con-
stantly, contention for the various memories, buses, and directories can cause an
enormous performance hit.
 
 
Search WWH ::




Custom Search