Information Technology Reference
In-Depth Information
defined (line 2). Then, while the stop condition is not reached (line 3), the set
temp is initialized to
(line 5). For each firefly j , FA tries to find the brightest
firefly i near j (line 6). FA calculates the attractiveness ( ʲ j ( r ji )) of firefly j with
respect to i by using Equation 2. If ʲ i ( r ij ) j ( r ji ) then firefly j will move
toward firefly i (line 7); otherwise firefly j will move randomly ( i = null , line
12). If j moves to i , m new fireflies are created by applying a specific operator
considering the distance between j and i (lines 9 and 10). In the case when firefly
j moves randomly, m new random fireflies are created (line 13 and 14) taking
j as the base. After p fireflies have moved, there are p
m fireflies in temp .
Then, the best p fireflies will be chosen based on their fitness (lines 18 and 19).
When the evolutionary process ends, FA returns the best one of those p fireflies
(line 21). In this context, DFA was shown to be ecient when solving various
combinatorial problems [8,14,15,36].
×
3 GPU-Firefly Algorithm
The goal of this section is to present our algorithmic proposal, which has been
called GPU-DFA. In short, our primary concern when designing DFA acceler-
ated by GPU is to create an ecient model that runs the main processes of
DFA entirely on GPU. The CUDA software model is employed so as to exploit
maximum parallel execution and high arithmetic intensity of GPUs. One of the
objectives is to minimize data transfers between the CPU and the GPU, thus
avoiding communication bottlenecks.
We follow a coarse-grained parallelization scheme (one thread by firefly or
pair of fireflies, respectively). The flowchart of the GPU-DFA model is presented
in Fig. 2. The beginning of the proposed algorithm is the initialization of the FA
and GPU parameters, respectively on the CPU side. Then, all parameters are
transferred to the GPU main memory. Next, we use a group of CUDA kernels
with the next tasks: At first, GPU-DFA creates and evaluates each solution in P
per GPU thread. Afterwards, until the stop condition has been reached, GPU-
DFA executes a series of kernels to evolve the current population. The division
in multiple kernels is due to the heterogeneity of the tasks and the complexity
thereof. In the evolution step, first GPU-DFA calculates parameters r , ʲ and I
between each pair of fireflies in parallel. When evaluating each pair of solutions
in parallel, it is necessary to apply a method of parallel reduction in GPU so
as to assess whether each solution of P found some brighter firefly it could get
closer to or move randomly. Once each firefly j has a defined movement, we
create and evaluate p
m new solutions (by disturbing each one with a specific
operator or randomly) in a separate kernel and save them in temp population.
Finally, temp is sorted according to its fitness by using parallel Bitonic Sort [33]
and replaced over P with the p best fireflies from temp .
CUDA code's performance depends largely on the deployment of the threads
in GPU, the number of kernels, the memory access schemes and the specific
function optimized for GPU (as reduction, sorts or random generators). All
methods presented here aim at exploiting fast-access local and global memory
×
 
Search WWH ::




Custom Search