Running MPI jobs #
Please use mpirun
tool to run MPI jobs through Slurm management system.
By default mpirun
will run your program using all allocated processes and cores. You can run specific number of processes with the parameter -n np
, e.g.:
mpirun -n 8 ./mpiprog8.exe
Intel MPI #
The command mpiexec
will not work with Intel MPI, please use mpirun
.
OpenMPI #
You can also use Open MPI with gcc compilers:
module load gnu openmpi
Hybrid MPI+OpenMP jobs #
An example of MPI+OpenMP program hybrid.c
:
#include <stdio.h>
#include <omp.h>
#include "mpi.h"
int main(int argc, char *argv[]) {
int numprocs, rank, namelen;
char processor_name[MPI_MAX_PROCESSOR_NAME];
int iam = 0, np = 1;
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Get_processor_name(processor_name, &namelen);
#pragma omp parallel default(shared) private(iam, np)
{
np = omp_get_num_threads();
iam = omp_get_thread_num();
printf("Node: %12s rank: %2d/%2d thread_id: %2d/%2d\n",
processor_name, rank, numprocs, iam, np);
}
MPI_Finalize();
}
Use the following commands to compile the code with Intel MPI:
module load intel
module load impi
mpiicc -qopenmp hybrid.c -o hybrid
Use the following commands to compile the code with Open MPI and gcc compilers:
module load gnu openmpi
mpicc -fopenmp hybrid.c -o hybrid
We will use the following Slurm batch script test-hybrid.sh
:
#!/bin/bash
# A hybrid MPI+OpenMP example
#SBATCH --job-name=hybrid
#SBATCH --output=hybrid.out
#SBATCH --time=5
####### 4 MPI ranks
#SBATCH --ntasks=4
####### 2 nodes
#SBATCH --nodes=2
####### 6 OMP threads per MPI rank
#SBATCH --cpus-per-task=6
## load MPI libs if needed
## for Intel:
# module load intel
# module load impi
## for Open MPI:
# module load gnu openmpi
## set number of OMP thread to the value of --cpus-per-task
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
mpirun ./hybrid
Slurm will send the user environment variables to the compute nodes, therefore you can skip loading modules for MPI if they are already loaded by the user.
We recommend to explicitly set OMP_NUM_THREADS
to the value of $SLURM_CPUS_PER_TASK
. In most cases this is not necessary, since each MPI process will be bounded to specific CPU cores, and OpenMP will peek the number of threads automatically.
Enqueue the job:
sbatch test-hybrid.sh
The result will be in hybrid.out
file. Example output:
Node: n09 rank: 1/ 4 thread_id: 0/ 6
Node: n09 rank: 1/ 4 thread_id: 5/ 6
Node: n10 rank: 3/ 4 thread_id: 0/ 6
Node: n10 rank: 3/ 4 thread_id: 2/ 6
Node: n09 rank: 1/ 4 thread_id: 4/ 6
Node: n09 rank: 1/ 4 thread_id: 2/ 6
Node: n09 rank: 1/ 4 thread_id: 3/ 6
Node: n09 rank: 1/ 4 thread_id: 1/ 6
Node: n10 rank: 3/ 4 thread_id: 3/ 6
Node: n10 rank: 3/ 4 thread_id: 5/ 6
Node: n10 rank: 3/ 4 thread_id: 1/ 6
Node: n10 rank: 3/ 4 thread_id: 4/ 6
Node: n09 rank: 0/ 4 thread_id: 3/ 6
Node: n09 rank: 0/ 4 thread_id: 5/ 6
Node: n09 rank: 0/ 4 thread_id: 2/ 6
Node: n09 rank: 0/ 4 thread_id: 4/ 6
Node: n09 rank: 0/ 4 thread_id: 1/ 6
Node: n09 rank: 2/ 4 thread_id: 0/ 6
Node: n09 rank: 2/ 4 thread_id: 1/ 6
Node: n09 rank: 2/ 4 thread_id: 3/ 6
Node: n09 rank: 2/ 4 thread_id: 4/ 6
Node: n09 rank: 0/ 4 thread_id: 0/ 6
Node: n09 rank: 2/ 4 thread_id: 2/ 6
Node: n09 rank: 2/ 4 thread_id: 5/ 6
The job used 2 nodes (n09
and n10
), two MPI processes were used on each node. Each MPI process used 6 OpenMP threads.