Queue management

Queue management system Slurm #

We use Slurm resource manager to schedule all user jobs on our cluster. The job is an ordinary shell script which is submitted to the queue by sbatch command. This script contains the information about requested resources (number of nodes, amount of RAM, required time). The user can use the usual bash commands inside this script. The script is executed by the system only on the first reserved core. The user should take care of executing his programs on other cores, e.g. with mpirun tool.

Example #

Let’s start with a simple example for job sleep for just one core and maximum execution time equal to 5 minutes. Create the file sleep.sh with the following content:

#!/bin/bash
#SBATCH --job-name sleep
#SBATCH --nodes=1
#SBATCH --time=05:00

echo "Start date: $(date)"
sleep 60
echo "  End date: $(date)"

You can enqueue this job with sbatch command:

sbatch sleep.sh

Strings starting with prefix #SBATCH define the parameters for sbatch command. You can also specify these parameters explicitly in the command line, e.g.:

sbatch  --job-name sleep  --nodes=1  --time=5:00  sleep.sh

sbatch command #

The main parameters for sbatch command:

  • -D path or --chdir=path
    The working dir of the job, by default the current dir is used.
  • -e path/file or --error=path/file
  • -o path/file or --output=path/file
    The file names for standard error (stderr) and standard output (stdout) streams, by default both streams are joined in one file slurm-<job_id>.out in the current dir.
  • --mail-type=NONE,FAIL,BEGIN,END,ALL and others
    Select type of events for notifying the user via e-mail. NONE — no notifications, FAIL — in case of job abortion, BEGIN — job start, END — job finish. You can select several events. More details in the manual man sbatch.
  • --mail-user=e-mail
    The e-mail address for notifications, by default the owner of the job.
  • -J name or --job-name=name
    The name of the job.
  • -p queue or --partition=queue
    The execution queue for the job. Several queues are available, the default one is x8core.
  • -n N or --ntasks=N
    Ask for N processes.
  • -N N or --nodes=N
    Ask for N nodes.
  • --nodes=N --ntasks-per-node=M
    Ask for N nodes with M processes on each node.
  • --cpus-per-task=N
    Ask for N CPU cores for each process (e.g. for MPI+OpenMP hybrid jobs). Default is 1 CPU core per process.
  • --mem=size
    The required memory size per each node. The size is a number with one of the suffixes K, M, G. E.g., --mem=16G will request 16 GB of memory per each node.
  • -t time or --time=time
    The maximum execution time for the job. The job will be terminated after this time is exceeded. Default time limit is 6 hours. The maximum time you can request is 24 hours. E.g., walltime=1:45:00 will stop the execution of the job after 1 hour and 45 minutes.
  • -C list or --constraint=list
    A comma-delimited list of extra constraints for nodes. Current contraints are: ib – working InfiniBand, avx, avx2, avx512 – CPUs with support for AVX, AVX2 и AVX-512 CPU command extensions.

Some useful environment variables defined by the Slurm management system:

  • SLURM_SUBMIT_DIR
    The directory from which the user submitted the job.
  • SLURM_JOB_ID
    Unique ID of job.
  • SLURMD_NODENAME
    The hostname of the current node.
  • SLURM_NTASKS
    Number of allocated CPU cores per job.

You can find more information in the manual man sbatch.

squeue command #

You can check the status of the jobs with squeue command. You will get the information for specific user running squeue -u user. The current state of the job is shown in ST column.

  • PD — the job is waiting in the queue for requested resources.
  • R — the job is running.
  • Other states are documented in the manual man squeue.

You can get the list of nodes allocated for your job in the NODELIST column.

The command squeue -l will show requested time for each job, and squeue --start will show an estimated time of a job start.

scancel command #

You can remove your job from the queue, e.g. in case you requested to much cores, with the command scancel <job_id>. You can also remove the running job the same way, in this case the job will be terminated first.

slurmtop program #

You can monitor the general statistics about cluster usage with slurmtop program. Press key q for exit.

Available queues #

We use several queues on the cluster. All compute nodes are separated in groups. Each queue can use nodes only from specific groups. The maximum execution time may be reduced for some queues.

Queue 001–004 005–012 013–016 017–018 021–030 Total cores Max walltime
x8core + 64 24 h
x10core + + 120 24 h
x12core + + 432 24 h
mix + + + + + 616 12 h
e5core + + + 296 24 h
long * 10 180 h
* Only 10 cores from cl1n016 are available in long queue.

In order to submit the job to e.g. x12core queue, you should use parameter -p x12core for sbatch command:

sbatch -p x12core -J big-problem --nodes=2 --ntasks-per-node=24  qbig.sh

Or you can add the following line in your job file:

#SBATCH -p x12core

The default queue is x8core.