Home Page


 



| Home | Research Groups |User Documentation | System Resources |

Test Case

A 136,000 atom system with AMBER12 pmemd

Code/binaries

  • AMBER12 compiled with MPICH2 running over InfiniBand ( /usr/local/Dist/amber12/bin/pmemd.MPI )
  • AMBER12 compiled with CUDA ( /usr/local/Dist/amber12/bin/pmemd.cuda )

To test the scaling and efficiency of AMBER, we ran the simulations on CPU cores first and proceeded to run them in GPUs setup. Those benchmarks are reported below.

Benchmarks

AMBER (CPU only)

A typical batch submission file for these calculations looks like this. We varied the NUMBER_OF_NODES parameter as needed, set up the path to CPU version of AMBER and run each simulation.

#!/bin/tcsh
#PBS -l nodes=__NUMBER_OF_NODES__:ppn=16
#PBS -l walltime=72:00:00
#PBS -q mercury
#PBS -j oe
#PBS -r n
#PBS -N cpu-only

set NUM=0
set RUN="part$NUM"
set IN="initial9.in"
set OLD="initial.rst"
set PRMTOP="standard.prmtop"
set MPI="mpiexec"
set EXE=pmemd.MPI
date

set DATA=$PBS_O_WORKDIR

cd $DATA

mpiexec -np __NUMBER_OF_NODES__*16 pmemd.MPI -O -i $IN -o $RUN.out -p $PRMTOP -c $OLD -r $RUN.rst -x $RUN.mdcrd -ref $OLD

As shown next, the scaling of AMBER12 looks very similar on Una and Marcy up to 32 cores since they have CPUs that perform equivalently. Marcy does better as one uses a larger number of cores. In fact, its performance peaks at 113 cores compared to 80 cores for Una.

Simulation speed (ns/day)
nCPU_Cores Una Marcy
2 0.32 0.3
4 0.62 0.58
8 1.16 1.11
16 2.09 2.01
32 3.63 3.62
64 4.5 6.02
80 5.12 6.82
96 4.87 7.31
112 4.5 7.85
128 7.68

The normalized efficiency looks like this:

Above 96 cores, the parallel efficiency on Marcy dips below 50%.

AMBER12 GPU)

We then proceeded to run these same three simulations in one of our two GPU-containing nodes. In fact, the tests were all run on node22 since node21 still appears to have issues.

#!/bin/tcsh
#PBS -q gpu
#PBS -l walltime=72:00:00
#PBS -j oe
#PBS -r n
#PBS -N gpu-only

set NUM=21
set RUN="part$NUM"
set IN="initial9.in"
set OLD="initial.rst"
set PRMTOP="standard.prmtop"
set MPI="mpiexec"
set EXE=pmemd.MPI
date

set DATA=$PBS_O_WORKDIR
cd $DATA
cat $PBS_NODEFILE
cat $PBS_GPUFILE

source /usr/local/Modules/3.2.10/init/tcsh
module load cuda/4.2

nvidia-smi --loop=1 > gputil.log &

pmemd.cuda -O -i $IN -o $RUN.out -p $PRMTOP -c $OLD -r $RUN.rst -x $RUN.mdcrd -ref $OLD

kill `pgrep  nvidia-smi`

The 'nvidia-smi –loop=1 > gputil.log &' line was added to monitor the GPU utilization every second as the calculations were running.

As shown next, the scaling of AMBER12 looks very similar on Una and Marcy.

GPU Simulation speed (ns/day)
Tesla K20 on Marcy 11.82
GTX680 (Adam's) 10.46
Marcy peak (112 cores) 7.85
Una peak (80 cores) 5.12

Therefore, the performance of 1 GPU (either the GTX-680 or Tesla K20) is equivalent better than at least 128 CPU cores on both Marcy. The Tesla K20 may do even better if we compile and run it with the latest version of CUDA (5.0) instead of the older one (4.2).

Conclusions

CPU jobs

The bottom line is that with our moderately large test system and AMBER12, both Marcy and Una scale extremely well up to 32 cores, but the efficiency starts to drop off above that. For maximum performance, we got out to 112 cores before excess parallalization actually slowed the job. The max ns/day I observed was 7.85 for 112 cores. In comparison to Una, an individual Marcy core is about the same as an individual Una core, but Marcy scales much better than Una when one uses more than 32 cores.

GPU jobs

Another worthwhile comparison is with our GPU server—with one GTX 680, we get 10.5 ns/day on this system with AMBER12-cuda. On Marcy's Nvidia Tesla K20, we got 11.82 ns/day which would be impressive if the K20 hadn't cost 5X more than the GTX 680.

documentation/benchmarks/amber.txt · Last modified: 2013/10/14 11:06 by btemelso



| Home | Research Groups |User Documentation | System Resources |



Sponsored by the Mercury Consortium.

Please direct any questions to: support@mercuryconsortium.org

 
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki Site Design by Sly Media Networks LLC