Alya Multiphysics Simulations on Intel’s Xeon Phi Accelerators - High Performance Computing

Information Technology Reference

In-Depth Information

Figure 2 shows the strong scalability of the Onera M6 compressible flow

explicit solver, for both the host nodes and the accelerators. This comparison

is done to assess whether the communication overhead is similarly penalizing

both cases. Eciency is degraded a bit earlier in the case of the Xeon Phi due

to the fact that the accelerators are much slower processors than the hosts,

up to 20 times slower. This fact is commented below in the conclusions. As in

previous works [10], we define the scalability “sweet spot” for a certain problem

by keeping the parallel eciency higher than, say, 75% - 80%, while increasing

the number of MPI tasks. This gives a mean amount of elements per task which

is the lower limit for sustained linear scalability. This number depends on the

physical problem solved and how it is implemented, the solution scheme, the size

and element types, etc: in this case, compressible flow solved explicitly in 1.8M

tetrahedra.

Kiln Furnace: Low Mach Incompressible Flow with Combustion a

Chemical Reactions, Implicit Scheme. This is a complex multi-physics

problem. It is a kiln furnace, typical of cement industry. It is a large cylindrical

vessel in slow rotatory motion where concrete and aggregates is cooked, with

temperature values rising up to 2000 degrees. The length can go up to 120 meters

and the diameter, up to 20 meters. The air is simulated for an incompressible

flow regime with a low Mach approximation, temperature transport is solved

with a the heat flow equation as it is convected by the fluid and several

species are transported, which reacts with each other, both producing and

consuming energy. In this case, there are 5 species. The three problems are

solved in a seggregated strongly coupled way, all of them through an implicit

time integration scheme. The problem is deeply described in [6].

Figure 1 shows a shapshot of the temperature contours in a kiln during the

ignition phase. Figure 3 plots the scalability of the fluid phase. In this case,

the elements-per-core sweet spot, where the scalability is sustained with no less

than 80% e ciency, goes up to around 10K. In this case, the sweet spot is lower,

which allows the authors to go no further than 80 MPI tasks withoug a serious

lose of parallel performance.

3 Conclusions and Future Lines

This paper is a very preliminary leverage of Alya on Intel Xeon Phi accelerators.

Intel Xeon Phi is a valuable option as an accelerator for supercomputing

applications on complex geometries with multiphysics. This is specially the case

when the simulation code has already being parallelized using MPI. Just by

compiling the code using the -mmic option, a running binary is obtained, with

very similar scalability properties when compared to the host binary, with no

code re-writting or re-engineering. In this paper we tested it in multi-physics

examples with both explicit and implicit schemes. However, there are several

points to improve:

Search WWH ::

Custom Search

Home