Information Technology Reference
In-Depth Information
Figure 2 shows the strong scalability of the Onera M6 compressible flow
explicit solver, for both the host nodes and the accelerators. This comparison
is done to assess whether the communication overhead is similarly penalizing
both cases. Eciency is degraded a bit earlier in the case of the Xeon Phi due
to the fact that the accelerators are much slower processors than the hosts,
up to 20 times slower. This fact is commented below in the conclusions. As in
previous works [10], we define the scalability “sweet spot” for a certain problem
by keeping the parallel eciency higher than, say, 75% - 80%, while increasing
the number of MPI tasks. This gives a mean amount of elements per task which
is the lower limit for sustained linear scalability. This number depends on the
physical problem solved and how it is implemented, the solution scheme, the size
and element types, etc: in this case, compressible flow solved explicitly in 1.8M
tetrahedra.
Kiln Furnace: Low Mach Incompressible Flow with Combustion a
Chemical Reactions, Implicit Scheme. This is a complex multi-physics
problem. It is a kiln furnace, typical of cement industry. It is a large cylindrical
vessel in slow rotatory motion where concrete and aggregates is cooked, with
temperature values rising up to 2000 degrees. The length can go up to 120 meters
and the diameter, up to 20 meters. The air is simulated for an incompressible
flow regime with a low Mach approximation, temperature transport is solved
with a the heat flow equation as it is convected by the fluid and several
species are transported, which reacts with each other, both producing and
consuming energy. In this case, there are 5 species. The three problems are
solved in a seggregated strongly coupled way, all of them through an implicit
time integration scheme. The problem is deeply described in [6].
Figure 1 shows a shapshot of the temperature contours in a kiln during the
ignition phase. Figure 3 plots the scalability of the fluid phase. In this case,
the elements-per-core sweet spot, where the scalability is sustained with no less
than 80% e ciency, goes up to around 10K. In this case, the sweet spot is lower,
which allows the authors to go no further than 80 MPI tasks withoug a serious
lose of parallel performance.
3 Conclusions and Future Lines
This paper is a very preliminary leverage of Alya on Intel Xeon Phi accelerators.
Intel Xeon Phi is a valuable option as an accelerator for supercomputing
applications on complex geometries with multiphysics. This is specially the case
when the simulation code has already being parallelized using MPI. Just by
compiling the code using the -mmic option, a running binary is obtained, with
very similar scalability properties when compared to the host binary, with no
code re-writting or re-engineering. In this paper we tested it in multi-physics
examples with both explicit and implicit schemes. However, there are several
points to improve:
 
Search WWH ::




Custom Search