Queue management system Slurm #
We use Slurm resource manager to schedule all user jobs on our cluster. The job is an ordinary shell script which is submitted to the queue by sbatch
command. This script contains the information about requested resources (number of nodes, amount of RAM, required time). The user can use the usual bash commands inside this script. The script is executed by the system only on the first reserved core. The user should take care of executing his programs on other cores, e.g. with mpirun tool.
Example #
Let’s start with a simple example for job sleep
for just one core and maximum execution time equal to 5 minutes.
Create the file sleep.sh
with the following content:
#!/bin/bash
#SBATCH --job-name sleep
#SBATCH --nodes=1
#SBATCH --time=05:00
echo "Start date: $(date)"
sleep 60
echo " End date: $(date)"
You can enqueue this job with sbatch
command:
sbatch sleep.sh
Strings starting with prefix #SBATCH
define the parameters for sbatch
command. You can also specify these parameters explicitly in the command line, e.g.:
sbatch --job-name sleep --nodes=1 --time=5:00 sleep.sh
sbatch
command
#
The main parameters for sbatch
command:
-D path
or--chdir=path
The working dir of the job, by default the current dir is used.-e path/file
or--error=path/file
-o path/file
or--output=path/file
The file names for standard error (stderr
) and standard output (stdout
) streams, by default both streams are joined in one fileslurm-<job_id>.out
in the current dir.--mail-type=NONE,FAIL,BEGIN,END,ALL and others
Select type of events for notifying the user via e-mail:NONE
— no notifications,FAIL
— in case of job abortion,BEGIN
— job start,END
— job finish. You can select several events. More details in the manualman sbatch
.--mail-user=e-mail
The e-mail address for notifications, by default the owner of the job.-J name
or--job-name=name
The name of the job.-p queue
or--partition=queue
The execution queue for the job. The default queue isnormal
.-n N
or--ntasks=N
Ask forN
processes.-N N
or--nodes=N
Ask forN
nodes.--nodes=N --ntasks-per-node=M
Ask forN
nodes withM
processes on each node.--cpus-per-task=N
Ask forN
CPU cores for each process (e.g. for MPI+OpenMP hybrid jobs). Default is 1 CPU core per process.--mem=size
The required memory size per each node. The size is a number with one of the suffixesK
,M
,G
. E.g.,--mem=16G
will request 16 GB of memory per each node.-t time
or--time=time
The maximum execution time for the job. The job will be terminated after this time is exceeded. Default time limit is 1 hour. The maximum time you can request is 7 days. E.g.,walltime=1:45:00
will stop the execution of the job after 1 hour and 45 minutes.-C list
or--constraint=list
A comma-delimited list of extra constraints for nodes. Current contraints are:ib
– working InfiniBand,avx, avx2, avx512
– CPUs with support for AVX, AVX2 и AVX-512 CPU command extensions.
Some useful environment variables defined by the Slurm management system:
SLURM_SUBMIT_DIR
The directory from which the user submitted the job.SLURM_JOB_ID
Unique ID of job.SLURMD_NODENAME
The hostname of the current node.SLURM_NTASKS
Number of allocated CPU cores per job.
You can find more information in the manual man sbatch
.
squeue
command
#
You can check the status of the jobs with squeue
command. You will get the information for specific user running squeue -u user
. The current state of the job is shown in ST column.
- PD — the job is waiting in the queue for requested resources.
- R — the job is running.
- Other states are documented in the manual
man squeue
.
You can get the list of nodes allocated for your job in the NODELIST column.
The command squeue -l
will show requested time for each job, and squeue --start
will show an estimated time of a job start.
scancel
command
#
You can remove your job from the queue, e.g. in case you requested to much cores, with the command scancel <job_id>
. You can also remove the running job the same way, in this case the job will be terminated first.
slurmtop
program
#
You can monitor the general statistics about cluster usage with slurmtop
program. Press key q
for exit.
Available queues #
Queue | Nodes | Total cores | Max walltime |
---|---|---|---|
normal |
n[01-24,29-36] |
1280 | 7 days |
short |
n[27] |
40 | 1 hour |
The default queue is normal
. Maximum walltime for a job is 7 days. The more nodes job is using the less the maximum walltime is. The general rule is the following: the amount of used nodes (including partially used) multiplied by the walltime should not exceed 432 node-hours.
Nodes amount | Max cores | Max walltime |
---|---|---|
1 | 40 | 7 days |
2 | 80 | 7 days |
3 | 120 | 6 days |
4 | 160 | 108 h |
6 | 240 | 72 h |
8 | 320 | 54 h |
12 | 480 | 36 h |
16 | 640 | 27 h |
18 | 720 | 24 h |
24 | 960 | 18 h |
32 | 1280 | 13.5 h |
Node-hours limitations may be decreased. Please check the updates at that page.
Additional limit for long jobs (more than 24 hours): all running long jobs may use only up to 18 nodes simultaneously.
Additional queue short
is used for tests, maximum walltime is 1 hour.
Queuesx10core
,x12core
,x20core
,mix
andlong
have been completely removed in January, 2023.
Additional restrictions for guest users #
Several additional restrictions apply to jobs submitted by user student
.
Starting from December 3, 2021, the total number of cores available for user student
is restricted to 128. Any job requesting more than 128 cores will be rejected. Several jobs may run in parallel as long as their total usage of cores does not exceed 128 cores.
Starting from February 10, 2022, the maximum time limit for jobs submitted by user student
is decreased to 4 hours. Similarly to earlier total core restrictions, an additional restriction is set to limit the total number of used nodes. Any job requesting more than 4 nodes will be rejected. Several jobs may run in parallel as long as they use no more than 4 nodes.