Information Technology Reference
In-Depth Information
2 Computational Aspects
2.1 Porting and Running on Intel Xeon Phi Coprocessor
Porting was not complex at all and no supplementary programming was
required. At the moment of performing the tests, BSC's main supercomputer,
Marenostrum, was being upgraded to include 84 Xeon Phi accelerators. For
that reason, the main effort was indeed to provide feedback to the system
administrators in order to do a clean set-up of the IXP. Alya is all written in
Fortran 2003, strictly following the standard. Except for Metis [1] no third-party
library is used. Alya has been compiled and tested in several supercomputer
architectures using different compilers, including Intel products. Therefore, no
special effort was required, except for adding the compiling option -mmic .
All cases were tested on Marenostrum III (MNIII), whose computing nodes
are 2x Sandy Bridge-EP E5-2670, 2.6GHz/1600 20M 8-core, with 32 Gb. Each
node has 2 PCIe x24, each one connected to a Xeon Phi 5110P with 8Gb memory.
One of the two Sandy Bridge has an Infiniband card connected to a PCI-E x8.
Finally, Mellanox provides a virtual interface to each Xeon Phi, allowing a fast
and transparent interconnection between all the 84 accelerators in MNIII. Alya
is compiled using the last version of the Intel Fortran Compiler and the Intel
MPI Library.
In both of the cases shown here, Xeon Phi performance is assessed taking into
account the followin aspects:
- Each Xeon Phi has 60 cores, each of them allowing up to 4 hardware threads.
Therefore, each Xeon Phi can run in parallel up to 240 MPI tasks.
- Pure MPI cases are considered, setting the OpenMP environment variable
to OMP NUMTHREADS=1. This is done explicitly to force single software
threads in regions where Alya has OpenMP'ed loops.
- Running in native mode, with all the MPI tasks running on board the Xeon
Phi. In this first test, the host does not provide computing power.
- MPI tasks are shued among four accelerators corresponding to two different
hosts.
- No special compilation options was used, except for -O1 .
2.2 Simulation Examples
We have chosen several cases of increasing complexity, to examine different
simulation scenarios and schemes. The common features of all the cases tested
are: relatively complex geometries, non-structured meshes, mixed different
element types (tetrahedra, hexahedra, prisms and pyramids). Strong scalability
is measured by computing the cpu-time required for each cycle of the time steps
loop. Both explicit and implicit schemes are tested. In this paper we show the
strong scalability for compressible flow and incompressible flow and combustion
for a multi-phsyics case.
 
Search WWH ::




Custom Search