How to Make Your LAMMPS Run Faster
Please note: This page is designed for advanced users only! The methods discussed here are still under extensive testing. If you find problems, please contact the system administrator.
LAMMPS is a classical molecular dynamics program and has potentials for simulations on solid-sate, soft matter, and coarse-grained systems. LAMMPS can be run on single processor or in parallel using MPI. To find more information about LAMMPS, please refer to its official webpage. This document will discuss how to make your LAMMPS run faster.
To use the methods discussed here, your LAMMPS binary version must be no earlier than Jan 1, 2015.
Methods to speed-up significantly
- (For jobs without Phi offloading) Use the Intel package.
Intel package implemented in recent version of LAMMPS will speed-up your simulation significantly. The following four lines should be added to your beginning of the LAMMPS input file. In here, we called the input file "lmp.in".
package intel 0 mode mixed balance -1
package omp 0
processors * * * grid numa
The sample PBS script is:
#PBS -l walltime=2:00:00 # WALLTIME
#PBS -l nodes=2:ppn=16 # Number of nodes and processes per node
#PBS -l feature=16core
#PBS -N lmp_test
#PBS -o std.out
#PBS -e std.err
#PBS -A [Your Project Account]
module use /nopt/nrel/apps/modules/candidate/modulefiles
module load impi-intel/4.1.3-14.0.2 mkl/14.2.144 lammps/10Feb2015-phi
mpirun -np 32 lmp_intel_phi -in lmp.in -l lmp.out
#Print out the time used in the most recent run
grep Loop lmp.out | tail -1 > time.log
This sample PBS runs request two 16-core nodes and LAMMPS using a 32 rank MPI, so that there is one MPI rank per core. Please note the executable "lmp_intel_phi" is used instead of "lmp" in the typical LAMMPS runs.
- (Running jobs on Phi nodes) Use the Xeon Phi coprocessors.
LAMMPS supports to accelerate the simulation by offloading neighbor list and non-bonded force calculations to Phi card. In order to use the Phi card, the user needs to request the Phi nodes first and also changes the first line of the LAMMPS input into
package intel 2 mode mixed balance -1
The "2" in bold means this job will use 2 Phi cards per node. Please refer to this page for more information.
- (For jobs with or without Phi offloading) Use OpenMP.
Users may choose to investigate the influence of the OpenMP parameter $OMP_NUM_THREADS for a slightly better performance. For instance, in the above example, we used the default value of $OMP_NUM_THREADS = 1, that is, one OpenMP for each MPI rank. If $OMP_NUM_THREADS is set to 2, and use "mpirun -np 16" instead, LAMMPS will run using 16 MPI ranks and for each MPI rank, there are 2 OpenMP threads. Therefore, the total number of threads is still 32, that is, one thread per core. To do this, the "mpirun" line in the sample PBS script should be changed into
env OMP_NUM_THREADS=2 mpirun -np 16 ...
For the multiple-nodes jobs, set $OMP_NUM_THREADS to a value larger than 1 sometimes may result in a performance of ~10% faster when running LAMMPS.