Home Page


| Home | Research Groups |User Documentation | System Resources |

Table of Contents

Here are some benchmarks and a set of best practices based on those benchmarks.


One can run NAMD using CPU cores or hybrid CPU+GPU cores. Benchmarks on three test cases were performed on Marcy and the results can be found here NAMD Benchmarks

The bottomline is

  • NAMD scales very well up to 256 cores we tested. So, one should safely use a large number of cores without seeing significant decline in parallel efficiency.
  • If GPU nodes are available, one should use them to get about a 3X performance boost – (16 CPU cores + 1 NVidia Tesla K20 GPU) ~ 3*(16 CPU cores)


Adam has performed a nice benchmark of AMBER jobs on Marcy and compared it to his cluster Una. We have added some GPU benchmarks for comparison. You can find them all here:

AMBER Benchmarks

CPU jobs

The bottom line is that with our moderately large test system and AMBER12, both Marcy and Una scale extremely well up to 32 cores, but the efficiency starts to drop off above that. For maximum performance, we got out to 112 cores before excess parallalization actually slowed the job. The max ns/day I observed was 7.85 for 112 cores. In comparison to Una, an individual Marcy core is about the same as an individual Una core, but Marcy scales much better than Una when one uses more than 32 cores.

GPU jobs

Another worthwhile comparison is with our GPU server—with one GTX 680, we get 10.5 ns/day on this system with AMBER12-cuda. On Marcy's Nvidia Tesla K20, we got 11.82 ns/day which would be impressive if the K20 hadn't cost 5X more than the GTX 680.


Benchmarks on seven test cases were performed on Marcy and the results can be found here ORCA Benchmarks

Two were RI-MP2 energy and gradient calculations; the remaining five were energy and gradient calculations using a pure (BLYP) and hybrid (PBE0) density functionals. Here are the conclusions we reached.

  • RI-MP2 energy calculations using ORCA scale well, but gradients and numerical frequencies are better off run in an SMP node
  • Surprisingly, MP2 calculations scale better than DFT
  • Equally surprisingly, hybrid functionals scale better than pure functionals.
documentation/benchmarks.txt · Last modified: 2013/10/22 18:09 by btemelso

| Home | Research Groups |User Documentation | System Resources |

Sponsored by the Mercury Consortium.

Please direct any questions to: support@mercuryconsortium.org

Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki Site Design by Sly Media Networks LLC