Introduction to Slurm
This knowledge article will give an overview of Slurm, new job submission requirements as well as commands and examples, and PBS to Slurm equivalents to convert jobscripts.
We have just completed a very major upgrade to our scheduling software from PBS to Slurm. Although this will prove to benefit us greatly in the future, we must first adapt to it in the present.
Slurm is a modern job-scheduler with capabilities that are compatible with the WSU Grid's heterogeneous hardware. It is an open source, fault-tolerant, and highly scalable cluster management, and job scheduling system for large and small Linux clusters. It allocates exclusive and/or non-exclusive access to resources (compute nodes) to users for some duration of time so they can perform work. It provides a framework for starting, executing, and monitoring work on the set of allocated nodes. It arbitrates contention for resources by managing a queue of pending work.
Slurm differs from PBS in its commands to submit and monitor jobs, syntax to request resources and how environment variables behave. In Slurm, sets of compute nodes are called partitions rather than queues (PBS). Resources are classified into QoSâs. A QOS is a classification that determines what kind of resources your job can use. Users can specify certain features of nodes utilizing the constraint directive. A job is given an allocation of resources to run. Jobs spawn steps, which are allocated resources from within the job's allocation.
Here are some basic commands that will get you up and running with Slurm:
Command | Description |
freenodes | Provides freenode, partition, and QoS information |
qme | Still works and provides information from squeue now |
sbatch | Submit a job to the batch queue system, e.g., sbatch myjob.sh, where myjob.sh is a SLURM job script. |
srun | Submit an interactive job to the batch queue system, e.g., srun --pty bash, will begin an interactive shell. |
scancel <jobID> | Cancel a job, e.g., scancel 123, where 123 is a job ID. |
sq | Check current jobs in the batch queue system. |
Key options to set when submitting your jobs
When submitting a job, the two key options required are the QoS and a maximum time limit for your job.
A QoS is a classification that determines what kind of resources your job can use. Our QoS are as follows:
QoS | Description | Directive |
primary | This contains our fastest publicly available nodes. It has a core limit of 512 cores | -q primary |
secondary | This contains our old equipment. Slow nodes that may still be useful for high throughput jobs, many core jobs, or high latency mpi jobs. | -q secondary |
gpu | Gpu may be found here. | -q gpu |
express | If you own equipment, you can submit your jobs here and kick everyone else off them. | -q express |
debug | If you have filled your allocations and want some space to work on stuff, then you can use the resources here. | -q debug |
requeue | Unlimited use. Your jobs might get killed and requeued though. Single node only. | -q requeue |
Time
A maximum time limit for the job is required under all conditions. Jobs submitted without providing a time limit will be rejected by the scheduler. This can be specified with the following directive:
-t, --time=<time>
Acceptable time formats include "minutes", "minutes:seconds", "hours:minutes:seconds", "days-hours", "days-hours:minutes" and "days-hours:minutes:seconds".
Sample Simple Job Script
Submit with the following command "sbatch -q primary simple.sh"
simple.sh:
!/bin/bash
#SBATCH --job-name Simple
#SBATCH -q primary
#SBATCH -N 1
#SBATCH -n 1
#SBATCH --mem=5G
#SBATCH --constraint=avx2
#SBATCH --mail-type=ALL
#SBATCH --mail-user=xxyyyy@wayne.edu
#SBATCH -o output_%j.out
#SBATCH -e errors_%j.err
#SBATCH -t 1-0:0:0
hostname
date
Sample MPI Job Script
Submit with the following command "sbatch -q secondary MPI.sh
MPI.sh:
#!/bin/bash
#SBATCH --job-name MPI
#SBATCH -q secondary
#SBATCH -N 12
#SBATCH -n 12
#SBATCH --mem=12G
#SBATCH --constraint=intel
#SBATCH --mail-type=ALL
#SBATCH --mail-user=xxyyyy@wayne.edu
#SBATCH -o output_%j.out
#SBATCH -e errors_%j.err
#SBATCH -t 7-0:0:0
hostname
date
Interactive Job: srun -q debug -t 10:0 --pty bash
PBS to Slurm Equivalents
User Commands | PBS | Slurm |
Job Submission | qsub [script_file] | sbatch [script_file] |
Job Submission - Interactive | qsub -I | srun --pty bash |
Job Deletion | qdel [job_id] | scancel [job_id] |
Job Status (by job) | qstat [job_id] | squeue [job_id] |
Job Status (by user) | qstat -u [user_name] | squeue -u [user_name] |
Job Details | qstat -f [job_id] | scontrol show job [job_id] |
Job hold | qhold [job_id] | scontrol hold [job_id] |
Job release | qrls [job_id] | scontrol release [job_id] |
Queue/partition list | qstat -Q | squeue -p [partition] |
Node list | pbsnodes -l | sinfo -N OR scontrol show nodes |
Cluster status | qstat -a | sinfo |
GUI | xpbsmon | sview |
Like PBS job scripts, Slurm has batch scripts that are submitted using the command: sbatch [script_file]. In your script rather than #PBS, you would use #SBATCH. The following table is job specifications to help convert your job scripts to Slurm.
Job Specification | PBS | Slurm |
Script directive | #PBS | #SBATCH |
Queue | -q [queue] | -p [queue] |
Node count | -l nodes=[count] | -N [min[-max]] |
CPU Count | -l ppn=[count] OR -l | -n [count] |
Wall Clock Limit | -l walltime=[hh:mm:ss] | -t [min] OR -t [days-hh:mm:ss] |
Standard Output File | -o [file_name] | -o [file_name] |
Standard Error File | -e [file_name] | e [file_name] |
Combine stdout/err | -j oe (both to stdout) OR -j eo | (use -o without -e) |
Copy Environment | -V | --export=[ALL | NONE | variables] |
Event Notification | -m abe | --mail-type=[events] |
Email Address | -M [address] | --mail-user=[address] |
Job Name | -N [name] | --job-name=[name] |
Job Restart | -r [y|n] | --requeue OR --no-requeue |
Working Directory | N/A | --workdir=[dir_name] |
Resource Sharing | -l naccesspolicy=singlejob | --exclusive OR--shared |
Memory Size | -l mem=[MB] | --mem=[mem][M|G|T] OR --mem-per-cpu=[mem][M|G|T] |
Account to charge | -W group_list=[account] | --account=[account] |
Tasks Per Node | -l mppnppn [PEs_per_node] | --tasks-per-node=[count] |
CPUs Per Task | --cpus-per-task=[count] | |
Job Dependency | -d [job_id] | --depend=[state:job_id] |
Job Project | --wckey=[name] | |
Job host preference | --nodelist=[nodes] AND/OR --exclude=[nodes] | |
Quality Of Service | -l qos=[name] | --qos=[name] |
Job Arrays | -t [array_spec] | --array=[array_spec] |
Generic Resources | -l other=[resource_spec] | --gres=[resource_spec] |
Licenses | --licenses=[license_spec] | |
Begin Time | -A "YYYY-MM-DD HH:MM: | --begin=YYYY-MM-DD[THH:MM[:SS]] |
Slurm has its own environment variables just like PBS. Here are PBS environment variables and their Slurm equivalents.
Environment | PBS | Slurm |
Job ID | $PBS_JOBID | $SLURM_JOBID |
Submit Directory | $PBS_O_WORKDIR | $SLURM_SUBMIT_DIR |
Submit Host | $PBS_O_HOST | $SLURM_SUBMIT_HOST |
Node List | $PBS_NODEFILE | $SLURM_JOB_NODELIST |
Job Array Index | $PBS_ARRAYID | $SLURM_ARRAY_TASK_ID |
If you need any further assistance in converting your job scripts to Slurm, please email us at hpc@wayne.edu.