Home Page


 



| Home | Research Groups |User Documentation | System Resources |

ORCA

Test cases

We ran two RI-MP2 benchmark calculations on (H2O)11 and five DFT benchmarks on (H2O)14

  • RI-MP2/aug-cc-pVTZ energies - (H2O)11, 1012 basis functions
  • BLYP/QZVP energies - (H2O)20, 2340 basis functions
  • PBE0/QZVP energies - (H2O)20, 2340 basis functions
  • RI-MP2/aug-cc-pVDZ analytic gradients - 5 optimization cycles for (H2O)11, 451 basis functions
  • PBE0/def2-TZVPP analytic gradients - 5 optimization cycles for (H2O)14, ~100 basis functions
  • BLYP/def2-TZVPP analytic gradients - 5 optimization cycles for (H2O)14, ~1000 basis functions
  • BLYP/QZVP analytic gradients - 5 optimization cycles for (H2O)14, 1638 basis functions
  • RI-MP2/aug-cc-pVTZ numerical frequencies from analytic gradients - (H2O)3, 276 basis functions
  • RI-MP2/aug-cc-pVTZ numerical frequencies from analytic gradients - (H2O)4, 368 basis functions

The calculations are available at /home/btemelso/benchmarks/ORCA/Marcy until they are moved to a more standard location.

A typical batch submission file for these calculations looks like this. We obviously varied the number of cores and nodes as necessary.

#!/bin/tcsh
##PBS -q mercury
#PBS -l mem=30gb
#PBS -l nodes=1:ppn=16
#PBS -l walltime=4:00:00
#PBS -j oe
#PBS -e j2-__NPROCS__
#PBS -N j2-__NPROCS__
#PBS -V

set echo
cd $PBS_O_WORKDIR

runorca-2.9.csh __NPROCS__ $PBS_JOBID

Code/binaries

Since ORCA is only provided in binary form, we used the precompiled ORCA 2.9 binaries from here

  • Version 2.9.1 for x86_64 with openmpi 1.4.4 (orca_2_9_1_linux_x86-64.tbz)

Benchmarks

Energies

RI-MP2/aVTZ (H2O)11

RI-MP2/aug-cc-pVTZ energies - (H2O)11, 1012 basis functions

ncores Wall time (mins) Parallel efficiency relative to t_4 (1=linear)
4 178 1.00
8 91 0.98
16 48 0.93
32 26 0.84
64 14 0.78

So, plain energy calculations scale well up to 64 cores and perhaps larger even though we haven't attempted such calculations.

BLYP/QZVP (H2O)20

Let's see how well energies scale for DFT calculations.

BLYP/QZVP energies - (H2O)20, 2340 basis functions

ncores Wall time (mins) Parallel efficiency relative to t_4 (1=linear)
4 14 1.00
8 11 0.66
16 9 0.38

That's not good. May be the calculation was too short.

PBE0/QZVP (H2O)20

Let's see how well energies scale for DFT calculations.

PBE0/QZVP energies - (H2O)20, 2340 basis functions

ncores Wall time (mins) Parallel efficiency relative to t_4 (1=linear)
4 68 1.00
8 38 0.90
16 23 0.75

The hybrid functionals like PBE0 scale better than the pure ones like B3LYP.

Gradients

RI-MP2/aVDZ (H2O)11

RI-MP2/aVDZ analytic gradients - 5 optimization cycles for (H2O)11, 451 basis functions

ncores Wall time (mins) Parallel efficiency relative to t_4 (1=linear)
4 161 1.00
8 86 0.94
16 48 0.84
32 30 0.66

The scaling for gradient calculations is good within a node, but quickly degrades upon communication across nodes over IB. Part of the problem is that the processes running on each node share globally accessible (non-local) scratch space.

PBE0/def2-TZVPP (H2O)14

We wanted to start with a hybrid functional like PBE0 that does not very amenable to density fitting, but it can still take advantage of ORCA's RIJCOSX approximation.

PBE0/def2-TZVPP analytic gradients - 5 optimization cycles for (H2O)14, ~1000 basis functions

ncores Wall time (mins) Parallel efficiency relative to t_4 (1=linear)
4 38 1.00
8 24 0.88
16 14 0.69

Even though ORCA is parallelized with OpenMPI to run over IB if necessary, most of the multinode jobs we attempted failed. The scaling within a node is acceptable.

BLYP/def2-TZVPP gradients (H2O)14

BLYP is a pure functional that should benefit greatly from density fitting.

BLYP/def2-TZVPP analytic gradients - 5 optimization cycles for (H2O)14, ~1000 basis functions

ncores Wall time (mins) Parallel efficiency relative to t_4 (1=linear)
4 13 1.00
8 9 0.71
16 7 0.45

Once again, we couldn't run multi-node jobs successfully. The scaling within a node is ok.

BLYP/QZVP (H2O)14

BLYP/QZVP analytic gradients - 5 optimization cycles for (H2O)14, ~2000 basis functions

ncores Wall time (mins) Parallel efficiency relative to t_4 (1=linear)
4 55 1.00
8 39 0.70
16 28 0.48

Nothing new here.

Numerical Frequencies from Analytic Gradients

RI-MP2/aVTZ (H2O)3

RI-MP2/aVTZ numerical frequencies from 54 analytic gradient calculations - (H2O)3, 276 basis functions

ncores Wall time (mins) Parallel efficiency relative to t_4 (1=linear)
4 431 1.00
8 248 0.87
12 187 0.77
16 129 0.84

RI-MP2/aVTZ (H2O)4

RI-MP2/aVTZ numerical frequencies from 72 analytic gradient calculations - (H2O)4, 368 basis functions

ncores Wall time (mins) Parallel efficiency relative to t_4 (1=linear)
4 1620 1.00
8 790 1.03
12 472 1.13
16 487 0.87

Much like the RI-MP2 gradient calculations, the scaling for numerical frequency calculations using analytic gradients is very good within a node.

Conclusions

  • Energy calculations using ORCA scale well, but gradients and numerical frequencies are better off run in an SMP node
  • Surprisingly, MP2 calculations scale better than DFT
  • Equally surprisingly, hybrid functionals scale better than pure functionals.

Since gradient calculations across multiple nodes usually failed, it is probably ideal to keep such calculations within a node. If one chooses to run these calculations over Infiniband, a globally accessible scratch space needs to be used. That can be achieved by using the 'runorca-2.9-local.csh' script instead of 'runorca-2.9.csh'. It uses a globally accessible ~/scratch instead of node-specific /scratch/$GROUP/$USER as scratch space. In any case, using the globally accessible scratch space will strain out NFS mounted I/O system, so it is highly discouraged. With that in mind, we suggest that users run ORCA calculations within a single node regardless of the type of calculation, i.e. energy, gradient, frequency, MP2, DFT, … etc

It will be interesting how the new ORCA release compares with the current version in terms of performance per core and scaling.

documentation/benchmarks/orca.txt · Last modified: 2013/08/15 09:40 by btemelso



| Home | Research Groups |User Documentation | System Resources |



Sponsored by the Mercury Consortium.

Please direct any questions to: support@mercuryconsortium.org

 
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki Site Design by Sly Media Networks LLC