Civil Engineering Reference
In-Depth Information
b) A surprising amount of time was spent in the Fortran 95 intrinsic
MAXVAL
(for testing
convergence in subroutine
checon
).
c) The most time-consuming operation is the Fortran 95 intrinsic
MATMUL
, and on
the particular vector computer, it was running considerably slower than the peak
machine speed.
Program 5.6 addresses all of these issues. First, unless freedoms are “tied” together (a
device not used in this topic) we can be sure that entries in
g
are not duplicated and so the
scatter operation can be vectorised. A “compiler directive” (
!dir$ ivdep
in this case)
is therefore inserted before the loop
elements 2a:
enabling the loop to be vectorised.
Second,
MAXVAL
is replaced by its longhand equivalent. This is obviously a problem with
the particular vendor whose implementation of
MAXVAL
could be much improved. Third,
and this is probably also a vendor problem, the
MATMUL
operation is changed from matrix-
vector to matrix-matrix by collecting all the
p(g)
vectors into a global matrix
g pmul
in
the loop
elements 2:
. Otherwise the program is the same as Program 5.5 and of course
produces the same results. However, Table 5.1 shows the progressive effects of making
changes to the coding in Program 5.5.
Table 5.1 Timings of vectorised programs
Original code (Program 5.5)
44.7 seconds
No dependency
25.3 seconds
Replace
MAXVAL
21.6 seconds
Matrix-matrix (Program 5.6)
9.3 seconds
The speed-up of Program 5.6 over Program 5.5 on this particular vector computer
was by a factor of about 5, and illustrates the importance of code analysis when using
such machines.
Glossary of variable names used in Chapter 5
Scalar integers:
cg iters
pcg iteration counter
cg limit
pcg iteration ceiling
fixed freedoms
number of fixed displacements
i
simple counter
simple counter
iel
1 for “symmetry”,
−
1 for “antisymmetry”
iflag
iwp
SELECTED REAL KIND(15)
simple counter
k
number of loaded nodes
loaded nodes
harmonic on which loads are to be applied
lth
number of dimensions
ndim
ndof
number of degrees of freedom per element
nels
number of elements
neq
number of degrees of freedom in the mesh