HUMMR is parallelized in a hybrid OpenMP/MPI manner to utilize multiple CPU cores (OMP) and multiple
computer nodes (MPI).
There are two options to set the number of OMP threads:
Setting the environment variable export OMP_NUM_THREADS=<nthreads>.
Write NThreads=<nthreads> in the General block of the HUMMR input file.
By choosing either option, HUMMR may be run in parallel simply by
hummrcalculation.inp
Running with MPI can be done as follows:
mpirun--bind-tonone-n<nprocs>calculation.inp
The --bind-to none option is necessary to unbind the OMP threads and distribute them among the hardware threads.
Note
By following the above steps, <nthreads>\(\times\)<nprocs> parallel threads will be created in total. It is advised to set <nprocs> equal to the number of computer nodes used and <nthreads> equal to the number of physical cores per computer node. That being said, it is possible to run HUMMR with arbitrary combinations of <nthreads>/<nprocs>.
Warning
The OMP/MPI hybrid parallelization is an ongoing effort, and some parts of the program still remain in MPI. If problems are encountered, please contact the developers.
When running HUMMR on a cluster, it is essential to utilize a queuing system such as PBS (Portable Batch System) or SLURM (Simple Linux Utility for Resource Management) for efficient job management. These programs allow you to submit jobs, manage resources, and ensure that your computations run smoothly across multiple nodes.
To submit a job using PBS, you need to create a submit script that specifies the job parameters and computer resource requirements. Below, we have provided an example Python script, which outlines the basic boilerplate for successfully running HUMMR jobs on computer clusters with PBS.
# !/usr/bin/pythonimportos,sysfromstringimportTemplate# Ensure we get three arguments: calc_fname (input file), nprocs (number of MPI processes), and nthreads (number of OpenMP threads)try:calc_fname=sys.argv[1]# The input file for the calculationnprocs=sys.argv[2]# Number of MPI processes to be launchednthreads=sys.argv[3]# Number of OpenMP threads to be used per MPI processexcept:# Print usage instructions if the arguments are not correctly parsedprint("Failed to parse input, correct usage of the script: \n"\
"\tsub_hummr_pbs.py <input-file> <nprocs> <nthreads>")quit()# Get the prefix of the calculation filename (used for output filenames)calc_fname_prefix=calc_fname.split(".")[0]# Template for the job script to be createdjob_script=Template("""#!/bin/bash# Explaination of the PBS directives: https://2021.help.altair.com/2021.1.2/PBS%20Professional/PBSUserGuide2021.1.2.pdf#PBS -l nodes=$nprocs#PBS -r n#PBS -j eo# Load the essential modules and set up environment variables# Create the scratch directory on each node and copy input filesfor node in `cat ${PBS_NODEFILE} | uniq` do ssh ${node} "mkdir -p ${SCRATCHDIR}; \ cp ${PBS_O_WORKDIR}/* ${SCRATCHDIR}" # Copy input files to the scratch directorydone# Change directory to the scratch directory for job executioncd ${SCRATCHDIR}# Set the number of OpenMP threads based on the provided argumentexport OMP_NUM_THREADS=$nthreads# Run the main program using mpirun with specified number of processesmpirun --bind-to none -np $nprocs --hostfile ${PBS_NODEFILE} -x LD_LIBRARY_PATH hummr $calc_fname 2>&1 >> ${PBS_O_WORKDIR}/$calc_fname_prefix.out# Copy the results from the scratch directory to the submission directorycp -p ${SCRATCHDIR}/*.* ${PBS_O_WORKDIR}rm ${SCRATCHDIR}/node0/*tmp # Clean up temporary filescp -pr ${SCRATCHDIR}/node0 ${PBS_O_WORKDIR} # Copy results from the first nodecd ${PBS_O_WORKDIR} # Return to the submission directoryrm -rf ${SCRATCHDIR} # Remove the scratch directory after the job is done""")# Substitute values for nprocs, nthreads, calc_fname, and calc_fname_prefixjob_script=job_script.safe_substitute(nprocs=nprocs,nthreads=nthreads,calc_fname=calc_fname,calc_fname_prefix=calc_fname_prefix)# Write the job script to a file with a .job extensionwithopen("{}.job".format(calc_fname_prefix),"w")asoutf:outf.write(job_script)# Submit the job to the PBS queueos.system("qsub {}.job".format(calc_fname_prefix))
Info
For more information on the usage of PBS, please refer to this user guide.
To use the job script, simply run the following command and provide the necessary arguments:
#!/usr/bin/pythonimportos,sysfromstringimportTemplatetry:calc_fname=sys.argv[1]# The input file for the calculation.nprocs=sys.argv[2]# Number of MPI processes to be launched.nthreads=sys.argv[3]# Number of OMP threads to be used per MPI process.except:print("Failed to parse input, correct usage of the script: \n"\
"\tsubmb.py <input-file> <nprocs> <nthreads>")quit()# Get the prefix of the calculation filename (used for output filenames)calc_fname_prefix=calc_fname.split(".")[0]# Template for the job script to be createdjob_script=Template("""#!/bin/bash# Explanation of SBATCH directives: https://slurm.schedmd.com/sbatch.html #SBATCH --nodes=$nprocs#SBATCH --ntasks=$nprocs#SBATCH --cpus-per-task=$nthreads #SBATCH --output=%x.e%j#SBATCH --error=%x.e%j# The starting and temporary work (scratch) directories. STARTDIR=$(pwd)WORKDIR=/scratch/job.$SLURM_JOB_ID.$USER # Make the work directories on the nodes and copy the files there srun mkdir -p $TMPDIRfor file in $STARTDIR/* ; do fil=$(basename $file); sbcast -f $file $TMPDIR/$fil ; donecd $TMPDIR# HUMMR is being launched with $nprocs MPI processes and $nthreads OMP threads.export OMP_NUM_THREADS=$nthreadsmpirun --bind-to none -n $nprocs hummr $calc_fname > $STARTDIR/$calc_fname_prefix.out# Copy back the files and remove the wort directory. srun cp -f $TMPDIR/$calc_fname_prefix.C0* $STARTDIR/srun rm -rf $TMPDIR""")# Substitute values for nprocs, nthreads, calc_fname, and calc_fname_prefixjob_script=job_script.safe_substitute(nprocs=nprocs,nthreads=nthreads,calc_fname=calc_fname,calc_fname_prefix=calc_fname_prefix)# Write the job script to a file with a .job extensionwithopen("{}.job".format(calc_fname_prefix),"w")asoutf:outf.write(job_script)# Submit the job to the SLURM queueos.system("sbatch {}.job".format(calc_fname_prefix))
Info
For mor information on the usage of SLURM, please refer to the documentation.
By giving the execute premission, chmod +x sub_hummr_slurm.py, the above script can be run in a
terminal by providing the required arguments: