Chemistry Reference
In-Depth Information
well..A.programming.model.such.as.OpenMP.[92].is.well.suited.for.this.task.(though.
there. are. other. methods).. The. OpenMP. standard. is. implemented. in. a. number. of.
compilers..In.our.experience,.application.of.the.OpenMP.parallel.constructs.to.this.
case.is.reasonably.straightforward.and.good.speedup.is.achieved.
In.order.to.parallelize.TB.algorithms.beyond.a.single.SMP.machine,.it.is.neces-
sary.to.use.some.sort.of.message.passing.parallel.paradigm.such.as.the.one.found.in.
the.message.passing.interface.(MPI).[92].standard..There.are.many.implementations.
of.this.standard,.both.from.vendors.and.from.freely.available.sources.(e.g.,.MPICH2.
[93],.OpenMPI.[94])..Users.should.ensure.that.the.MPI.implementation.they.use.is.
appropriate.for.their.system.for.optimal.performance..In.the.following,.we.will.focus.
on.parallelization.of.the.eigenvalue.solver.through.use.of.the.ScaLAPACK.library..
We.note.that.other.libraries.exist.for.this.purpose,.e.g.,.PETSc.[95]/SLEPc.[96].
One.of.the.most.important.factors.in.achieving.good.parallelization.when.using.
the.ScaLAPACK.[97].library.is.data.layout..The.ScaLAPACK.User's.Guide.[98].has.
extensive.documentation.on.this.subject.that.we.will.not.repeat.here..We.use.a.1D.
block.cyclic.data.layout.for.the.examples.presented.in.this.section.with.a.block.size.of.
64..A.2D.block.cyclic.data.layout.is.optimal.for.ScaLAPACK,.but.not.recommended.
until.the.number.of.processors.exceeds.8.[98]..Since.the.storage.for.the.matrices.is.
distributed.over.the.MPI.processes,.it.is.natural.to.have.each.process.compute.its.ele-
ments.of.the.Hamiltonian.and.overlap.matrices..This.involves.some.bookkeeping.of.
the.mapping.between.global.and.local.matrix.indices,.which.is.thoroughly.described.
in. the. ScaLAPACK. User's. Guide.. Thus,. parallelization. of. the. construction. of. the.
Hamiltonian.and.overlap.matrices.arises.naturally.in.this.algorithm.
We.have.implemented.a.parallel.TB.method.in.our.EHTB.code.using.ScaLAPACK.
as.outlined.above..In.order.to.illustrate.the.potential.of.this.method,.we.have.com-
puted.eigenvalues.and.eigenvectors.for.a.series.of.gold.clusters.from.13.to.923.atoms.
(with. 9. orbitals. per. atom). using. three. parallel. algorithms.. The. irst. algorithm. is.
PDSYGVX. from. ScaLAPACK.. This. is. the. only. routine. in. the. current. version. of.
ScaLAPACK.for.the.parallel.computation.of.the.generalized.symmetric.eigenvalue.
problem..Fortunately,.several.parallel.solvers.analogous.to.the.solvers.presented.in.
Section.8.6.1.exist.for.the.standard.symmetric.eigenvalue.problem.and.it.is.straight-
forward. to. produce. the. corresponding. parallel. solver. for. the. generalized. problem..
The. parallel. generalized. symmetric. eigenvalue. solvers. follow. a. simple. pattern..
A. Cholesky. factorization. of. the. overlap. matrix. is. formed. using. the. ScaLAPACK.
routine.PDPOTRF..The.problem.is.transformed.to.a.standard.symmetric.eigenvalue.
problem.via.a.call.to.PDSYNGST..The.standard.symmetric.eigenvalue.problem.is.
solved.using.PDSYEVX.(or.another.subroutine,.see.below)..Finally,.the.eigenvalues.
are.backtransformed.using.PDTRSM..In.order.to.produce.a.parallel.solver.for.the.
generalized.symmetric.eigenvalue.problem.using.the.divide.and.conquer.method,.the.
call.to.PDSYEVX.is.replaced.by.a.call.to.PDSYEVD..Similarly,.in.order.to.produce.
a.parallel.solver.for.the.generalized.symmetric.eigenvalue.problem.using.the.MRRR.
algorithm,. the. call. to. PDSYEVX. is. replaced. by. a. call. to. PDSYEVR. [99].. At. this.
time,.PDSYEVR.is.not.a.part.of.the.ScaLAPACK.library,.but.it.is.anticipated.that.it.
will.become.part.of.ScaLAPACK.in.the.future.
Following.the.results.of.Section.8.6.1,.we.anticipate.that.the.solver.based.on.the.
divide.and.conquer.algorithm,.PDSYGVD,.or.the.MRRR.algorithm,.PDSYGVR,.will.
Search WWH ::




Custom Search