HPC Grid Tutorial: Introduction to Slurm

This knowledge article will give an overview of Slurm, new job submission requirements as well as commands and examples, and PBS to Slurm equivalents to convert jobscripts.

We have just completed a very major upgrade to our scheduling software from PBS to Slurm. Although this will prove to benefit us greatly in the future, we must first adapt to it in the present.

Slurm is a modern job-scheduler with capabilities that are compatible with the WSU Grid's heterogeneous hardware. It is an open source, fault-tolerant, and highly scalable cluster management, and job scheduling system for large and small Linux clusters. It allocates exclusive and/or non-exclusive access to resources (compute nodes) to users for some duration of time so they can perform work. It provides a framework for starting, executing, and monitoring work on the set of allocated nodes. It arbitrates contention for resources by managing a queue of pending work.

Slurm differs from PBS in its commands to submit and monitor jobs, syntax to request resources and how environment variables behave. In Slurm, sets of compute nodes are called partitions rather than queues (PBS). Resources are classified into QoS’s. A QOS is a classification that determines what kind of resources your job can use. Users can specify certain features of nodes utilizing the constraint directive. A job is given an allocation of resources to run. Jobs spawn steps, which are allocated resources from within the job's allocation.

Here are some basic commands that will get you up and running with Slurm:

Command

Description

freenodes

Provides freenode, partition, and QoS information

qme

Still works and provides information from squeue now

sbatch

Submit a job to the batch queue system, e.g., sbatch myjob.sh, where myjob.sh is a SLURM job script.

srun

Submit an interactive job to the batch queue system, e.g., srun --pty bash, will begin an interactive shell.

scancel <jobID>

Cancel a job, e.g., scancel 123, where 123 is a job ID.

sq

Check current jobs in the batch queue system.

Key options to set when submitting your jobs

When submitting a job, the two key options required are the QoS and a maximum time limit for your job.

A QoS is a classification that determines what kind of resources your job can use. Our QoS are as follows:

QoS

Description

Directive

primary

This contains our fastest publicly available nodes. It has a core limit of 512 cores

-q primary

secondary

This contains our old equipment. Slow nodes that may still be useful for high throughput jobs, many core jobs, or high latency mpi jobs.

-q secondary

gpu

Gpu may be found here.

-q gpu

express

If you own equipment, you can submit your jobs here and kick everyone else off them.

-q express

debug

If you have filled your allocations and want some space to work on stuff, then you can use the resources here.

-q debug

requeue

Unlimited use. Your jobs might get killed and requeued though. Single node only.

-q requeue

Time

A maximum time limit for the job is required under all conditions. Jobs submitted without providing a time limit will be rejected by the scheduler. This can be specified with the following directive:

-t, --time=<time>

Acceptable time formats include "minutes", "minutes:seconds", "hours:minutes:seconds", "days-hours", "days-hours:minutes" and "days-hours:minutes:seconds".

Sample Simple Job Script

Submit with the following command "sbatch -q primary simple.sh"

simple.sh:

!/bin/bash

#SBATCH --job-name Simple

#SBATCH -q primary

#SBATCH -N 1

#SBATCH -n 1

#SBATCH --mem=5G

#SBATCH --constraint=avx2

#SBATCH --mail-type=ALL

#SBATCH --mail-user=xxyyyy@wayne.edu

#SBATCH -o output_%j.out

#SBATCH -e errors_%j.err

#SBATCH -t 1-:0:0

hostname

date

Sample MPI Job Script

Submit with the following command "sbatch -q secondary MPI.sh

MPI.sh:

#!/bin/bash

#SBATCH --job-name MPI

#SBATCH -q secondary

#SBATCH -N 12

#SBATCH -n 12

#SBATCH --mem=12G

#SBATCH --constraint=intel

#SBATCH --mail-type=ALL

#SBATCH --mail-user=xxyyyy@wayne.edu

#SBATCH -o output_%j.out

#SBATCH -e errors_%j.err

#SBATCH -t 7-0:0:0

hostname

date

Interactive Job: srun -q debug -t 10:0 --pty bash

PBS to Slurm Equivalents

User Commands

PBS

Slurm

Job Submission

qsub [script_file]

sbatch [script_file]

Job Submission - Interactive

qsub -I

srun --pty bash

Job Deletion

qdel [job_id]

scancel [job_id]

Job Status (by job)

qstat [job_id]

squeue [job_id]

Job Status (by user)

qstat -u [user_name]

squeue -u [user_name]

Job Details

qstat -f [job_id]

scontrol show job [job_id]

Job hold

qhold [job_id]

scontrol hold [job_id]

Job release

qrls [job_id]

scontrol release [job_id]

Queue/partition list

qstat -Q

squeue -p [partition]

Node list

pbsnodes -l

sinfo -N OR scontrol show nodes

Cluster status

qstat -a

sinfo

GUI

xpbsmon

sview

Like PBS job scripts, Slurm has batch scripts that are submitted using the command: sbatch [script_file]. In your script rather than #PBS, you would use #SBATCH. The following table is job specifications to help convert your job scripts to Slurm.

Job Specification

PBS

Slurm

Script directive

#PBS

#SBATCH

Queue

-q [queue]

-p [queue]

Node count

-l nodes=[count]

-N [min[-max]]

CPU Count

-l ppn=[count] OR -l
mppwidth=[PE_count]

-n [count]

Wall Clock Limit

-l walltime=[hh:mm:ss]

-t [min] OR -t [days-hh:mm:ss]

Standard Output File

-o [file_name]

-o [file_name]

Standard Error File

-e [file_name]

e [file_name]

Combine stdout/err

-j oe (both to stdout) OR -j eo
(both to stderr)

(use -o without -e)

Copy Environment

-V

--export=[ALL | NONE | variables]

Event Notification

-m abe

--mail-type=[events]

Email Address

-M [address]

--mail-user=[address]

Job Name

-N [name]

--job-name=[name]

Job Restart

-r [y|n]

--requeue OR --no-requeue

Working Directory

N/A

--workdir=[dir_name]

Resource Sharing

-l naccesspolicy=singlejob

--exclusive OR--shared

Memory Size

-l mem=[MB]

--mem=[mem][M|G|T] OR --mem-per-cpu=[mem][M|G|T]

Account to charge

-W group_list=[account]

--account=[account]

Tasks Per Node

-l mppnppn [PEs_per_node]

--tasks-per-node=[count]

CPUs Per Task

 

--cpus-per-task=[count]

Job Dependency

-d [job_id]

--depend=[state:job_id]

Job Project

 

--wckey=[name]

Job host preference

 

--nodelist=[nodes] AND/OR --exclude=[nodes]

Quality Of Service

-l qos=[name]

--qos=[name]

Job Arrays

-t [array_spec]

--array=[array_spec]

Generic Resources

-l other=[resource_spec]

--gres=[resource_spec]

Licenses

 

--licenses=[license_spec]

Begin Time

-A "YYYY-MM-DD HH:MM:
SS"

--begin=YYYY-MM-DD[THH:MM[:SS]]

Slurm has its own environment variables just like PBS. Here are PBS environment variables and their Slurm equivalents.

Environment

PBS

Slurm

Job ID

$PBS_JOBID

$SLURM_JOBID

Submit Directory

$PBS_O_WORKDIR

$SLURM_SUBMIT_DIR

Submit Host

$PBS_O_HOST

$SLURM_SUBMIT_HOST

Node List

$PBS_NODEFILE

$SLURM_JOB_NODELIST

Job Array Index

$PBS_ARRAYID

$SLURM_ARRAY_TASK_ID

If you need any further assistance in converting your job scripts to Slurm, please email us at hpc@wayne.edu.