Alya Multiphysics Simulations on Intel’s Xeon Phi Accelerators - High Performance Computing

Information Technology Reference

In-Depth Information

all the cases, being very ecient for a particular kind of algorithmics. The fact

is that in GPGPUs regular data structures heavily condition parallel action on

these structures. This fact penalizes a widespread use, although makes them

the best option for such things such as simulation on cartesian meshes. As a

supplementary drawback, in order to get the most of them, very heavy code

re-engineering is required.

A second option that has appeared more recently is the INTEL Xeon Phi

(IXP) accelerator, also known as MIC. Being based on the X-86 architecture,

they do not require a special re-coding. IXP represents a very appealing

architecture for codes such as Alya for several reasons. Firstly, Alya does

not exploit mesh cartesian structure because it is specially designed for non-

structured meshes, where connectivity bandwidth is not uniform and data access

is more complex. Due to their flexibility, non-structured meshes are well suited

for complex geometries. Secondly, due to the Physics that Alya solves, the

numerical schemes cannot guarantee that all the threads will have the same

amount of work. Finally, coupled multi-physics requires a lot of flexibility to

program the different subproblems and, above all, the coupling schemes. It is

worth to mention that Alya is around 500K lines, with more than 40 researchers

working, experts in different disciplines. There is only one version of the code,

which is standard enough to run in several platforms, and specifically designed

to run in parallel and sequential in the same version. We made of portability,

flexibility and code re-usage three of the main pillars of Alya. Therefore, we look

for an accelerator where we can still keep the same flags up. This paper goes in

this direction, exploring Intel Xeon Phi possibilities.

We attack the porting to Intel Xeon Phi in stages. In this paper and as

a starting point, we focus in MPI parallelism. It is a relatively natural path,

because Alya has already shown good scalability for cases where parallel work is

distributed only through MPI tasks. Additionally, we observe that debugging a

parallel application based on MPI tasks is easier than when based on OpenMP

threads, so we can be sure where is the origin of differences in results, if any,

and in performance. In a next paper we will address the hybrid case.

The tests have been carried out in native mode, where the code is compiled

and run on the accelerator.

Briefly, we wanted to explore the following aspects:

- How much porting effort is required and how much of the code must be

re-written and/or reengineered?

- As Alya is specially targetted to engineering simulations, would it be possible

for a small company to upgrade a workstation just by buying a couple of

IXP? Is it possible for them to run the same kind of problems with little

effort but still scaling?

- Being under the same parallelization scheme, i.e. MPI-tasks, could IXP be

considered as a “small cluster”?

- What is the behaviour of Alya when using accelerators hosted at different

nodes?

High Performance Computing

Search WWH ::

Custom Search

Home