Running HUMMR¶

Quick run¶

Once installed, the program can be run in the terminal simply by specifying an input file,

hummr calculation.inp

Running in parallel¶

HUMMR is parallelized in a hybrid OpenMP/MPI manner to utilize multiple CPU cores (OMP) and multiple computer nodes (MPI).

There are two options to set the number of OMP threads:

Setting the environment variable export OMP_NUM_THREADS=<nthreads>.
Write NThreads=<nthreads> in the General block of the HUMMR input file.

By choosing either option, HUMMR may be run in parallel simply by

hummr calculation.inp

Running with MPI can be done as follows:

mpirun --bind-to none -n <nprocs> calculation.inp

The --bind-to none option is necessary to unbind the OMP threads and distribute them among the hardware threads.

Note

By following the above steps, <nthreads>\(\times\)<nprocs> parallel threads will be created in total. It is advised to set <nprocs> equal to the number of computer nodes used and <nthreads> equal to the number of physical cores per computer node. That being said, it is possible to run HUMMR with arbitrary combinations of <nthreads>/<nprocs>.

Warning

The OMP/MPI hybrid parallelization is an ongoing effort, and some parts of the program still remain in MPI. If problems are encountered, please contact the developers.

Submitting jobs on clusters¶

When running HUMMR on a cluster, it is essential to utilize a queuing system such as PBS (Portable Batch System) or SLURM (Simple Linux Utility for Resource Management) for efficient job management. These programs allow you to submit jobs, manage resources, and ensure that your computations run smoothly across multiple nodes.

Using PBS¶

To submit a job using PBS, you need to create a submit script that specifies the job parameters and computer resource requirements. Below, we have provided an example Python script, which outlines the basic boilerplate for successfully running HUMMR jobs on computer clusters with PBS.

sub_hummr_pbs.py
# !/usr/bin/python
import os, sys
from string import Template

# Ensure we get three arguments: calc_fname (input file), nprocs (number of MPI processes), and nthreads (number of OpenMP threads)
try:
    calc_fname = sys.argv[1]  # The input file for the calculation
    nprocs = sys.argv[2]      # Number of MPI processes to be launched
    nthreads = sys.argv[3]    # Number of OpenMP threads to be used per MPI process
except:
    # Print usage instructions if the arguments are not correctly parsed
    print("Failed to parse input, correct usage of the script: \n"\
          "\tsub_hummr_pbs.py <input-file> <nprocs> <nthreads>")
    quit()

# Get the prefix of the calculation filename (used for output filenames)
calc_fname_prefix = calc_fname.split(".")[0]

# Template for the job script to be created
job_script = Template("""#!/bin/bash
# Explaination of the PBS directives: https://2021.help.altair.com/2021.1.2/PBS%20Professional/PBSUserGuide2021.1.2.pdf
#PBS -l nodes=$nprocs
#PBS -r n
#PBS -j eo

# Load the essential modules and set up environment variables

# Create the scratch directory on each node and copy input files
for node in `cat ${PBS_NODEFILE} | uniq`
 do
  ssh ${node} "mkdir -p ${SCRATCHDIR}; \
  cp ${PBS_O_WORKDIR}/* ${SCRATCHDIR}"  # Copy input files to the scratch directory
done

# Change directory to the scratch directory for job execution
cd ${SCRATCHDIR}

# Set the number of OpenMP threads based on the provided argument
export OMP_NUM_THREADS=$nthreads

# Run the main program using mpirun with specified number of processes
mpirun --bind-to none -np $nprocs --hostfile ${PBS_NODEFILE} -x LD_LIBRARY_PATH hummr $calc_fname 2>&1 >> ${PBS_O_WORKDIR}/$calc_fname_prefix.out

# Copy the results from the scratch directory to the submission directory
cp -p ${SCRATCHDIR}/*.* ${PBS_O_WORKDIR}
rm ${SCRATCHDIR}/node0/*tmp  # Clean up temporary files
cp -pr ${SCRATCHDIR}/node0 ${PBS_O_WORKDIR}  # Copy results from the first node

cd ${PBS_O_WORKDIR}  # Return to the submission directory
rm -rf ${SCRATCHDIR}  # Remove the scratch directory after the job is done
""")

# Substitute values for nprocs, nthreads, calc_fname, and calc_fname_prefix
job_script = job_script.safe_substitute(nprocs=nprocs, nthreads=nthreads, calc_fname=calc_fname,
                                        calc_fname_prefix=calc_fname_prefix)

# Write the job script to a file with a .job extension
with open("{}.job".format(calc_fname_prefix), "w") as outf:
    outf.write(job_script)

# Submit the job to the PBS queue
os.system("qsub {}.job".format(calc_fname_prefix))

Info

For more information on the usage of PBS, please refer to this user guide.

To use the job script, simply run the following command and provide the necessary arguments:

sub_hummr_pbs.py <calc_fname> <nprocs> <nthreads>

Using SLURM¶

An analogous script with the Slurm job scheduler is given below

sub_hummr_slurm.py
#!/usr/bin/python
import os, sys
from string import Template

try:
    calc_fname = sys.argv[1] # The input file for the calculation.
    nprocs = sys.argv[2]     # Number of MPI processes to be launched.
    nthreads = sys.argv[3]   # Number of OMP threads to be used per MPI process.
except:
    print("Failed to parse input, correct usage of the script: \n"\
          "\tsubmb.py <input-file> <nprocs> <nthreads>")
    quit()

# Get the prefix of the calculation filename (used for output filenames)
calc_fname_prefix = calc_fname.split(".")[0]

# Template for the job script to be created
job_script = Template("""#!/bin/bash
# Explanation of SBATCH directives: https://slurm.schedmd.com/sbatch.html                      
#SBATCH --nodes=$nprocs
#SBATCH --ntasks=$nprocs
#SBATCH --cpus-per-task=$nthreads                      
#SBATCH --output=%x.e%j
#SBATCH --error=%x.e%j

# The starting and temporary work (scratch) directories.                    
STARTDIR=$(pwd)
WORKDIR=/scratch/job.$SLURM_JOB_ID.$USER                      

# Make the work directories on the nodes and copy the files there                      
srun mkdir -p $TMPDIR
for file in $STARTDIR/* ; do fil=$(basename $file); sbcast -f $file $TMPDIR/$fil ; done
cd $TMPDIR

# HUMMR is being launched with $nprocs MPI processes and $nthreads OMP threads.
export OMP_NUM_THREADS=$nthreads
mpirun --bind-to none -n $nprocs hummr $calc_fname > $STARTDIR/$calc_fname_prefix.out

# Copy back the files and remove the wort directory.               
srun cp -f $TMPDIR/$calc_fname_prefix.C0* $STARTDIR/
srun rm -rf $TMPDIR
""")

# Substitute values for nprocs, nthreads, calc_fname, and calc_fname_prefix
job_script = job_script.safe_substitute(nprocs=nprocs, nthreads=nthreads, calc_fname=calc_fname,
                                        calc_fname_prefix=calc_fname_prefix)

# Write the job script to a file with a .job extension
with open("{}.job".format(calc_fname_prefix), "w") as outf:
    outf.write(job_script)

# Submit the job to the SLURM queue
os.system("sbatch {}.job".format(calc_fname_prefix))

Info

For mor information on the usage of SLURM, please refer to the documentation.

By giving the execute premission, chmod +x sub_hummr_slurm.py, the above script can be run in a terminal by providing the required arguments:

sub_hummr_slurm.py <calc_fname> <nprocs> <nthreads>