Table of Contents

We ran two RI-MP2 benchmark calculations on (H2O)_{11} and five DFT benchmarks on (H2O)_{14}

- RI-MP2/aug-cc-pVTZ energies - (H2O)
_{11}, 1012 basis functions - BLYP/QZVP energies - (H2O)
_{20}, 2340 basis functions - PBE0/QZVP energies - (H2O)
_{20}, 2340 basis functions - RI-MP2/aug-cc-pVDZ analytic gradients - 5 optimization cycles for (H2O)
_{11}, 451 basis functions - PBE0/def2-TZVPP analytic gradients - 5 optimization cycles for (H2O)
_{14}, ~100 basis functions - BLYP/def2-TZVPP analytic gradients - 5 optimization cycles for (H2O)
_{14}, ~1000 basis functions - BLYP/QZVP analytic gradients - 5 optimization cycles for (H2O)
_{14}, 1638 basis functions - RI-MP2/aug-cc-pVTZ numerical frequencies from analytic gradients - (H2O)
_{3}, 276 basis functions - RI-MP2/aug-cc-pVTZ numerical frequencies from analytic gradients - (H2O)
_{4}, 368 basis functions

The calculations are available at /home/btemelso/benchmarks/ORCA/Marcy until they are moved to a more standard location.

A typical batch submission file for these calculations looks like this. We obviously varied the number of cores and nodes as necessary.

#!/bin/tcsh ##PBS -q mercury #PBS -l mem=30gb #PBS -l nodes=1:ppn=16 #PBS -l walltime=4:00:00 #PBS -j oe #PBS -e j2-__NPROCS__ #PBS -N j2-__NPROCS__ #PBS -V set echo cd $PBS_O_WORKDIR runorca-2.9.csh __NPROCS__ $PBS_JOBID

Since ORCA is only provided in binary form, we used the precompiled ORCA 2.9 binaries from here

- Version 2.9.1 for x86_64 with openmpi 1.4.4 (orca_2_9_1_linux_x86-64.tbz)

RI-MP2/aug-cc-pVTZ energies - (H2O)_{11}, 1012 basis functions

n_{cores} | Wall time (mins) | Parallel efficiency relative to t_4 (1=linear) |
---|---|---|

4 | 178 | 1.00 |

8 | 91 | 0.98 |

16 | 48 | 0.93 |

32 | 26 | 0.84 |

64 | 14 | 0.78 |

So, plain energy calculations scale well up to 64 cores and perhaps larger even though we haven't attempted such calculations.

Let's see how well energies scale for DFT calculations.

BLYP/QZVP energies - (H2O)_{20}, 2340 basis functions

n_{cores} | Wall time (mins) | Parallel efficiency relative to t_4 (1=linear) |
---|---|---|

4 | 14 | 1.00 |

8 | 11 | 0.66 |

16 | 9 | 0.38 |

That's not good. May be the calculation was too short.

Let's see how well energies scale for DFT calculations.

PBE0/QZVP energies - (H2O)_{20}, 2340 basis functions

n_{cores} | Wall time (mins) | Parallel efficiency relative to t_4 (1=linear) |
---|---|---|

4 | 68 | 1.00 |

8 | 38 | 0.90 |

16 | 23 | 0.75 |

The hybrid functionals like PBE0 scale better than the pure ones like B3LYP.

RI-MP2/aVDZ analytic gradients - 5 optimization cycles for (H2O)_{11}, 451 basis functions

n_{cores} | Wall time (mins) | Parallel efficiency relative to t_4 (1=linear) |
---|---|---|

4 | 161 | 1.00 |

8 | 86 | 0.94 |

16 | 48 | 0.84 |

32 | 30 | 0.66 |

The scaling for gradient calculations is good within a node, but quickly degrades upon communication across nodes over IB. Part of the problem is that the processes running on each node share globally accessible (non-local) scratch space.

We wanted to start with a hybrid functional like PBE0 that does not very amenable to density fitting, but it can still take advantage of ORCA's RIJCOSX approximation.

PBE0/def2-TZVPP analytic gradients - 5 optimization cycles for (H2O)_{14}, ~1000 basis functions

n_{cores} | Wall time (mins) | Parallel efficiency relative to t_4 (1=linear) |
---|---|---|

4 | 38 | 1.00 |

8 | 24 | 0.88 |

16 | 14 | 0.69 |

Even though ORCA is parallelized with OpenMPI to run over IB if necessary, most of the multinode jobs we attempted failed. The scaling within a node is acceptable.

BLYP is a pure functional that should benefit greatly from density fitting.

BLYP/def2-TZVPP analytic gradients - 5 optimization cycles for (H2O)_{14}, ~1000 basis functions

n_{cores} | Wall time (mins) | Parallel efficiency relative to t_4 (1=linear) |
---|---|---|

4 | 13 | 1.00 |

8 | 9 | 0.71 |

16 | 7 | 0.45 |

Once again, we couldn't run multi-node jobs successfully. The scaling within a node is ok.

BLYP/QZVP analytic gradients - 5 optimization cycles for (H2O)_{14}, ~2000 basis functions

n_{cores} | Wall time (mins) | Parallel efficiency relative to t_4 (1=linear) |
---|---|---|

4 | 55 | 1.00 |

8 | 39 | 0.70 |

16 | 28 | 0.48 |

Nothing new here.

RI-MP2/aVTZ numerical frequencies from 54 analytic gradient calculations - (H2O)_{3}, 276 basis functions

n_{cores} | Wall time (mins) | Parallel efficiency relative to t_4 (1=linear) |
---|---|---|

4 | 431 | 1.00 |

8 | 248 | 0.87 |

12 | 187 | 0.77 |

16 | 129 | 0.84 |

RI-MP2/aVTZ numerical frequencies from 72 analytic gradient calculations - (H2O)_{4}, 368 basis functions

n_{cores} | Wall time (mins) | Parallel efficiency relative to t_4 (1=linear) |
---|---|---|

4 | 1620 | 1.00 |

8 | 790 | 1.03 |

12 | 472 | 1.13 |

16 | 487 | 0.87 |

Much like the RI-MP2 gradient calculations, the scaling for numerical frequency calculations using analytic gradients is very good within a node.

- Energy calculations using ORCA scale well, but gradients and numerical frequencies are better off run in an SMP node
- Surprisingly, MP2 calculations scale better than DFT
- Equally surprisingly, hybrid functionals scale better than pure functionals.

Since gradient calculations across multiple nodes usually failed, it is probably ideal to keep such calculations within a node. If one chooses to run these calculations over Infiniband, a globally accessible scratch space needs to be used. That can be achieved by using the 'runorca-2.9-local.csh' script instead of 'runorca-2.9.csh'. It uses a globally accessible ~/scratch instead of node-specific /scratch/$GROUP/$USER as scratch space. In any case, using the globally accessible scratch space will strain out NFS mounted I/O system, so it is highly discouraged. With that in mind, we suggest that users run ORCA calculations within a single node regardless of the type of calculation, i.e. energy, gradient, frequency, MP2, DFT, … etc

It will be interesting how the new ORCA release compares with the current version in terms of performance per core and scaling.

Please direct any questions to: support@mercuryconsortium.org