Alya Multiphysics Simulations on Intel’s Xeon Phi Accelerators - High Performance Computing

Information Technology Reference

In-Depth Information

2 Computational Aspects

2.1 Porting and Running on Intel Xeon Phi Coprocessor

Porting was not complex at all and no supplementary programming was

required. At the moment of performing the tests, BSC's main supercomputer,

Marenostrum, was being upgraded to include 84 Xeon Phi accelerators. For

that reason, the main effort was indeed to provide feedback to the system

administrators in order to do a clean set-up of the IXP. Alya is all written in

Fortran 2003, strictly following the standard. Except for Metis [1] no third-party

library is used. Alya has been compiled and tested in several supercomputer

architectures using different compilers, including Intel products. Therefore, no

special effort was required, except for adding the compiling option -mmic .

All cases were tested on Marenostrum III (MNIII), whose computing nodes

are 2x Sandy Bridge-EP E5-2670, 2.6GHz/1600 20M 8-core, with 32 Gb. Each

node has 2 PCIe x24, each one connected to a Xeon Phi 5110P with 8Gb memory.

One of the two Sandy Bridge has an Infiniband card connected to a PCI-E x8.

Finally, Mellanox provides a virtual interface to each Xeon Phi, allowing a fast

and transparent interconnection between all the 84 accelerators in MNIII. Alya

is compiled using the last version of the Intel Fortran Compiler and the Intel

MPI Library.

In both of the cases shown here, Xeon Phi performance is assessed taking into

account the followin aspects:

- Each Xeon Phi has 60 cores, each of them allowing up to 4 hardware threads.

Therefore, each Xeon Phi can run in parallel up to 240 MPI tasks.

- Pure MPI cases are considered, setting the OpenMP environment variable

to OMP NUMTHREADS=1. This is done explicitly to force single software

threads in regions where Alya has OpenMP'ed loops.

- Running in native mode, with all the MPI tasks running on board the Xeon

Phi. In this first test, the host does not provide computing power.

- MPI tasks are shued among four accelerators corresponding to two different

hosts.

- No special compilation options was used, except for -O1 .

2.2 Simulation Examples

We have chosen several cases of increasing complexity, to examine different

simulation scenarios and schemes. The common features of all the cases tested

are: relatively complex geometries, non-structured meshes, mixed different

element types (tetrahedra, hexahedra, prisms and pyramids). Strong scalability

is measured by computing the cpu-time required for each cycle of the time steps

loop. Both explicit and implicit schemes are tested. In this paper we show the

strong scalability for compressible flow and incompressible flow and combustion

for a multi-phsyics case.

Search WWH ::

Custom Search

Home