A Glimpse of Parallel Computing - Elements of Scientific Computing

Information Technology Reference

In-Depth Information

u `C1

i;j;k

D ˛ u i;j;k

C ˇ u i 1;j;k

C u i;j1;k

C u i;j;k1

C u i C1;j;k

C u i;jC1;k

C u i;j;kC1

(10.3)

for 1 i; j; k n 1 ,where ˛ D 1 6t= h 2 and ˇ D t= h 2 .

The above formula requires eight floating-point operations per inner grid point:

two multiplications and six additions. That is, the total number of floating-point

operations per time step is 8.n 1/ 3 . Recall from Sect. 7.4.5 that explicit numerical

schemes often have a strict restriction on the maximum time step size, which is

1

6 h 2

t

for this particular 3D case. The minimum number of time steps N needed for solving

( 10.1 ) between t D 0 and t D 1 is consequently N h 2 D 6n 2 . Therefore, the

total number of floating-point operations for the entire computation is

6n 2 8.n 1/ 3 D 48n 2 .n 1/ 3 48n 5 :

If we have n D 1;000 , then the entire computation requires 48 10 15 floating-

point operations. How much CPU time does it need to carry out these operations

on a serial computer? Let us assume that an extremely fast serial computer has a

peak performance of 48 GFLOPS, i.e., 48 10 9 FLOPS; then the total compu-

tation will require 10 6 s, i.e., 278 h. This may not sound like an alarmingly long

time. However, the sustainable performance of numerical schemes of type ( 10.3 ),

which are computer-memory intensive, is normally far below the theoretical peak

performance. This is due to the increasing gap between the processor speed and

memory speed on modern microprocessors, commonly referred to as the “memory

wall” problem [21]. Moreover, our simple model equation ( 10.1 ) has not considered

variable coefficients, difficult boundary conditions, or source terms. Therefore, it

is fair to say that a realistic 3D diffusion problem can require a lot more than the

above theoretical CPU usage, making a serial computer totally unfit for the explicit

scheme to work on a 1;000 1;000 1;000 mesh.

As another consideration, numerical simulators are frequently used as an exper-

imental tool. Many different runs of the same simulator are typically needed,

requiring the computing time of each simulation to be within e.g. an hour, or ideally

minutes.

It should be mentioned that there exist more computationally efficient methods

for solving ( 10.1 ) than the above explicit scheme. For example, a numerical method

with no stability constraint can use dramatically fewer time steps, but with much

more work per step. Nevertheless, the above simple example suffices to show that

serial computers clearly have a limit in computing speed. The bad news is that the

speed of a single CPU core is not expected to grow anymore in the future. Also, as

Elements of Scientific Computing

Search WWH ::

Custom Search

Home