Information Technology Reference
In-Depth Information
all the cases, being very ecient for a particular kind of algorithmics. The fact
is that in GPGPUs regular data structures heavily condition parallel action on
these structures. This fact penalizes a widespread use, although makes them
the best option for such things such as simulation on cartesian meshes. As a
supplementary drawback, in order to get the most of them, very heavy code
re-engineering is required.
A second option that has appeared more recently is the INTEL Xeon Phi
(IXP) accelerator, also known as MIC. Being based on the X-86 architecture,
they do not require a special re-coding. IXP represents a very appealing
architecture for codes such as Alya for several reasons. Firstly, Alya does
not exploit mesh cartesian structure because it is specially designed for non-
structured meshes, where connectivity bandwidth is not uniform and data access
is more complex. Due to their flexibility, non-structured meshes are well suited
for complex geometries. Secondly, due to the Physics that Alya solves, the
numerical schemes cannot guarantee that all the threads will have the same
amount of work. Finally, coupled multi-physics requires a lot of flexibility to
program the different subproblems and, above all, the coupling schemes. It is
worth to mention that Alya is around 500K lines, with more than 40 researchers
working, experts in different disciplines. There is only one version of the code,
which is standard enough to run in several platforms, and specifically designed
to run in parallel and sequential in the same version. We made of portability,
flexibility and code re-usage three of the main pillars of Alya. Therefore, we look
for an accelerator where we can still keep the same flags up. This paper goes in
this direction, exploring Intel Xeon Phi possibilities.
We attack the porting to Intel Xeon Phi in stages. In this paper and as
a starting point, we focus in MPI parallelism. It is a relatively natural path,
because Alya has already shown good scalability for cases where parallel work is
distributed only through MPI tasks. Additionally, we observe that debugging a
parallel application based on MPI tasks is easier than when based on OpenMP
threads, so we can be sure where is the origin of differences in results, if any,
and in performance. In a next paper we will address the hybrid case.
The tests have been carried out in native mode, where the code is compiled
and run on the accelerator.
Briefly, we wanted to explore the following aspects:
- How much porting effort is required and how much of the code must be
re-written and/or reengineered?
- As Alya is specially targetted to engineering simulations, would it be possible
for a small company to upgrade a workstation just by buying a couple of
IXP? Is it possible for them to run the same kind of problems with little
effort but still scaling?
- Being under the same parallelization scheme, i.e. MPI-tasks, could IXP be
considered as a “small cluster”?
- What is the behaviour of Alya when using accelerators hosted at different
nodes?
Search WWH ::




Custom Search