Information Technology Reference
In-Depth Information
Since some
a
j
can be either functions (composed of hyperbolic and exponential
operations) or constants, the approach is to compute first those
a
j
with lower
CPI
8
and then, compute the functions with higher CPIs (for example, math
operations such as “
” have lower CPI than “
sinh
(
x
)” and
“
e
x
”.) When
a
j
= 0, the algorithm skips the product operator and computes
the next
i
-th term as shown in Fig. 4.
×
”, “+”, “-”, “
÷
Fig. 4.
Flow diagram of the optimization of the number of operations
This method reduces unnecessary operations when there is a multiplication
by zero, thus decreasing the waste of valuable clock periods.
4.3 Use of Parallel Algorithms and Architectures
In order to reduce the processing time of complex computations, parallel tech-
niques can be applied. For a computational problem, the parallelization can be
achieved using compiler directives [6] or manually. The former approach is not
recommended in this particular problem because it yields lower performance in
complex problems like this since the parallelization process depends on many fac-
tors such as algorithm structure
9
, the parallel computer architecture
10
and the
parallel programming model. Since there is no general method for parallelization,
there are a series of steps described in [5] that were used for the parallelization
process.
Based on the flow diagram of Fig. 3, there are two types of partition schemes:
Loop Partitioning (called LP) on
VX
loop and Task Partitioning (called TP)
where the tasks (see Fig. 3) are parallelized. As mentioned previously, these
8
Acronym of Clocks Per Instruction.
9
Regarding the data and task dependencies in a code.
10
Architectures based on distributed memory, shared memory, number of processing
units and interconnection topology.