5. Slurm Workload Manager

Tux uses SLURM (Simple Linux Utility for Resource Management) as the workload manager to manage compute jobs and resources.

Users submit jobs to the cluster through Slurm which places jobs in a queue until the system is ready to run them. Slurm selects which jobs to run, when and where to run them, according to a pre-determined policy meant to balance competing user needs and to maximize efficient use of cluster resources.

Note

One cannot ssh into a compute node unless one has a Slurm job running on that node!

Therefore, users have no alternative but to use Slurm to access compute nodes.

Slurm can start multiple jobs on a single node, or a single job on multiple nodes. In the most typical scenario, the user submits a job through Slurm by the use of a job-script that will find and allocate the resources required to execute your job. The following table provides an overview of Slurm commands used to submit and manage jobs. Furthermore, the system also provides a man page for each of the commands.

5.1. Main Slurm Tools

Slurm provides a variety of tools that allow a user to manage and understand their jobs. The most central Slurm tools are:

Command

Functionality

sbatch

Submit a job script

salloc

Create an interactive SLURM shell.

srun

Execute argument command on the resources assigned to a job. Note: must be executed inside an active job (script or interactive environment); in most cases, using mpiexec (which in turn uses srun for startup) is the preferred alternative.

squeue

Print table of submitted jobs and their state. Note: non-privileged users can only see their own jobs.

sinfo

Provide overview of cluster status

scontrol

Query and modify SLURM state

Additional information about each of the Slurm tools can be obtain from the their respective manual pages or by following the links in the table above.

5.2. Partitions

The compute nodes are organized in partitions. The partition short includes all compute nodes (time limit of 1 hour). Tux is a heterogeneous cluster consisting of different Intel CPU architectures and it defines separate partitions for each of the CPU architectures. There is also a partition normal which also is the default partition on the system. The partitions that are defined on tux are

Partition

Hosts

Time limit

#cores/node

short

all

1 hour

mixed

normal

tux-[1-9]

7 days

16

sandybridge

tux-[1-11]

30 days

16

haswell

tux-[12-14]

30 days

20

haswell

tux-[15-16]

30 days

24

skylake

tux-[17-24]

30 days

32

skylake

tux-[25-28]

30 days

40

cascadelake

tux-[29-30]

30 days

40

You can see more detailed information about the compute nodes, like, the current number of nodes in each partition and their state, by running the sinfo command (a Slurm command) on a login node (or any other node in the cluster).

For instance, to get detailed information about which and each node belonging to a given partition <partition>, use the command

$ sinfo -Nl -p <partition>

Note

Since the tux cluster is heterogeneous, the number of physical cores varies between the partitions and this has to be accounted for when preparing the job scripts (see discussion below).

5.3. Hyperthreading

Intel processors support hyperthreading mode which might (or might not) increase the performance of your application. With hyperthreading, often also called multithreading, you have in your job script to increase the number of MPI tasks per node from its physical number of cores to two times this number; for instance, on our skylake nodes the number of tasks should be increased from 32 to 64. Please be aware that with such an increase in the number of MPI tasks per node each process gets only half of the memory by default. If you need more memory, you have to specify it in your job script and use a node with more memory for the job to run (see example batch scripts).

If you want to disable hyperthreading (or multithreading) for your job, you can add the following line to the header of your job script:

#SBATCH --hint=nomultithread    # do not use hyperthreading

For some jobs, this can results in better performance, while for other applications, the use of hyperthreading can degrade performance significantly (in particularly when using MPI). Hence, it is up to the user to test what is optimal.

Warning

Hyperthreading does not benefit all applications! Also, some applications may show improvement with some process counts but not with others, and there may be other unforeseen issues. Therefore, before using this technology in your production run, you should test your applications with and without hyperthreading. If your application runs more than two times slower with hyperthreading than without, do not use it.

5.4. sbatch -- the submit command

The common way of submitting jobs in Slurm is to use the command sbatch. Some of its most frequently used options are:

Option

Short option

Description

--job-name=<JobName>

-J <JobName>

Job name in queue

--partition=<partition>

-p <partition>

Partition to use

--time=<D-HH:MM:SS>

-t <D-HH:MM:SS>

Maximum wall time

--nodes=<node>

-N <node>

# nodes to use

--ntasks=<tasks>

-n <tasks>

# tasks in total for the job

--ntasks-per-node=<tasks>

# tasks / node

--ntasks-per-core

# tasks / core

--cpus-per-task=<cores>

-c <cores>

# cores / task

--hint=[no]multithread

Hyperthreading Yes/No

--mem=<mem>

Memory / node

--mem-per-cpu=<mem>

Memory / core

--mail-type=<type>

Send email at start/end of job

--mail-user=<email>

Email address to use

--gres=<list>

Generic consumable resources (csv)

--constraint=<attribute>

Request certain features (e.g. bigmem)

Many more options are available for sbatch; to see them, inspect the manual page for the command or the relevant section of the Slurm homepage.

Warning

Slurm uses confusing terminology many will say! For Slurm a cpu is a core, which means that an option --cpus-per-task=1 actually means one core per task. Furthermore, the Slurm option -mem-per-cpu refers to the memory per core.

Note

The sbatch option --ntasks-per-core=# is only suitable for compute nodes having HyperThreading enabled in hardware/BIOS, which is not always the case.

The classic way of using the options to sbatch, given in the table above, is to collect them in a specially prepared file known as a job script. How to prepare such files will be presented in detail below, and we will later give several examples of such files.

How to submit a job?

Submitting a job script my-slurm-job.sh can be done with the sbatch command:

$ sbatch my-slurm-job.sh
Submitted batch job 160
$

The output Submitted batch job 160 indicates that the job was submitted successfully to the batch system and that JobID 160 was assigned to it. This JobID is used by Slurm to uniquely refer to the job; for instance, it can be used to monitor, analyze, and inspect job both while it is running but also after it is completed. Also notice that the you are back at the command prompt immediately after submitting the job.

As a Slurm job runs, unless you redirect output, a file named slurm-160.out (in this case) will be produced in the directory where the sbatch command was ran. You can use cat, less or any text editor to view it. The file contains the output your program would have written to a terminal if run interactively.

Because job scripts specify the desired resources for your job, you won't need to specify any resources on the command line. However, you can overwrite or add any job parameters by providing the specific resource as a command line option to sbatch:

$ sbatch --time=1-0:0:0  <your-job-script>

Running this command will force your job to have a wall time of one day no matter what your job script specified.

How to set environment variables in your jobscript

It is often convenient to be able to set environment variables that appear inside a jobscript from the command line. Even if Slurm does not support using arbitrary variables in #SBATCH lines within a job script; for example, #SBATCH -N=$NODES will not replace $NODES with the variable's value. However, variables that appears elsewhere in the jobscript can be replaced.

Note

The sbatch flag --export may conveniently be used to pass environment variables from the command line to the jobscript. In this way, a generic jobscript can be used to start a number of related jobs.

Say that a jobscript slurm_job.sh contains the environment variables INPUT and OUTPUT and are used to define the input and output files of the simulations. To define them from the command line, using the same (generic) jobscript, one may do

$ sbatch --export=INPUT='gauss-1.nml',OUTPUT='Result-1.h5' slurm_job.sh
$ sbatch --export=INPUT='gauss-2.nml',OUTPUT='Result-2.h5' slurm_job.sh

In this way, the variables that we specified on the command line are available as $INPUT and $OUTPUT in your jobscript.

5.5. squeue -- the queue inspection command

After having submitting a job, you can be interested to see its status. To inspect all jobs that you (here user ingves) have in the queue you can use the squeue command in the following way

$ squeue -u ingves
     JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
       155 sandybrid test-run   ingves  R       5:25      1 tux-9
       160    normal slurm-jo   ingves  R       0:25      1 tux-1
$

This shows that you currently have two jobs in the queue and that their status (ST) is both running (R).

If you are only interested in a specific job, say the one with JobID 160, you can do

$ squeue -j 160
     JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
       160    normal slurm-jo   ingves  R       2:25      1 tux-1
$

You may also ask for a list of jobs running on a given partition (-p) (and the user ingves in this case):

$ squeue -u ingves -p sandybridge
     JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
       155 sandybrid test-run   ingves  R       7:25      1 tux-9
$

5.6. Submitting jobs in Slurm

There are two main job types in Slurm:

  • Batch jobs

  • Interactive jobs

By far, batch jobs are the most commonly used job type to run on HPC clusters, but interactive jobs still fulfill some very specific needs, as we will see. Batch jobs are resource provisions that run applications on nodes away from the user and do not require supervision or interaction. They are commonly used for applications that run for long periods of time or require little to no user input. Batch jobs are created from a job script which provide resource, requirements, and commands for the job. Usually it is a bash (or shell) script, but it could, for instance, also be a python script. However, our job scripts will be bash scripts.

Now we will consider both Slurm job types in turn, but first we start by an illustrative example!

5.6.1. Submit your first Slurm job

In our first example, we will executes the UNIX command sleep 30, i.e. a command that starts a process that simply does nothing else than sleeping for 30 seconds. Admittedly, this is not a very useful task, but it suffices for the purpose of this illustration. If we on the command line of the login node execute the command by using srun, we will observe

$ time srun sleep 30
                            # ... and after about 30s ....
real 0m30.443s
user 0m0.007s
sys  0m0.016s
$

When pressing return after time srun sleep 30, nothing seems to happen for about 30 seconds, after which three lines are printed in the terminal. What happened here is that Slurm allocated a node for the job and started it there. After around 30 seconds, the job finishes, and the timing information for the job is reported in the terminal (and we get the commands prompt back). If we in another terminal on the login node do

$ squeue -u ingves
     JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
       166    normal    sleep   ingves  R       0:25      1 tux-1
$

we see that the job is running and it has been doing so for 25s.

We just demonstrated that it is possible to run jobs completely from the command line, but doing so is often overly tedious and unorganized. Therefore, we will now instead submit the same job using sbatch. First we need to create a simple job script, here called first-slurm-job.sh, and provide it as an argument to sbatch, like this

$ cat first-slurm-job.sh
#!/bin/bash
sleep 30
$
$ sbatch first-slurm-job.sh
Submitted batch job 168
$
$ squeue -u ingves
     JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
       168    normal first-sl   ingves  R       0:25      1 tux-1
$

In this case, just after executing the command sbatch first-slurm-job.sh, we get the command prompt back immediately. This is so since the job was submitted to the (default) queue, and we do not have to wait for the job to finish (as was the case above). The output that this job may produce on standard output (stdout) will be written to a file named slurm-168.out where 168 refers to the jobID of the job.

It should be remarked that neither of the two examples we just gave are very typical. First, srun is usually not executed directly from the commandline. Instead, it is normally used inside the job-script as will be illustrated in the examples below. For instance, as the second line of the first-slurm-job.sh we could have used srun sleep 30. Second, the job-script usually will contain a special header which sets options to Slurm. This is done by prefacing the option with #SBATCH. As an example, extending the script first-slurm-job.sh to

#!/bin/bash

# --- Header
#SBATCH --job-name=Second-Job
#SBATCH --partition=sandybridge
#SBATCH --time=0-01:00:00

# --- Start the job
time srun sleep 30

will set the name of the job to Second-Job, request the partition sandybridge for the job, and the time limit for the job is set to 1 hour.

5.6.2. Batch jobs

Batch jobs are submitted in Slurm by sbatch <job-script>. However, how can such scripts be prepared? Job scripts are more or less shell scripts with some extra parameters to set the resource requirements. They all share a similar structure consisting of three main parts:

  1. A Slurm header that starts with #SBATCH followed by an option

  2. Optionally, a list of environment variables to set and modules to load

  3. Finally, command that you want to run (typically started with srun)

As a concrete example of a Slurm job script, consider the following MPI job

#!/bin/bash

# --- Header
#SBATCH --job-name=Ex-MPI
#SBATCH --partition=sandybridge     # Partition to use
#SBATCH --time=0-01:00:00           # Time limit D-HH:MM:SS

#SBATCH --nodes=2                   # No. of nodes to use
#SBATCH --ntasks-per-node=16        # No. of physical cores per node
#SBATCH --cpus-per-task=1
#SBATCH --hint=nomultithread        # Do not use hyperthreading

#SBATCH --mem-per-cpu=1500MB        # Total memory / # cores

# --- Modules
ml purge
ml Intel
ml hdf5

# --- Environment variables

# --- Start the job
time srun ./Rayleigh2D -p input.nml -o output.h5

This script requests two nodes (--nodes=2) on the sandybridge partition for no more than 1 hour (header) to run the MPI job Rayleigh2D -p input.nml -o output.h5 (last line). The option --ntasks-per-node=16 sets the number of task per node to 16 which equals the number of physical cores for the sandybridge nodes. Plane (or non-hybrid) MPI jobs correspond to a single task per core, and in Slurm this is specified by --cpus-per-task=1.

It should be noticed that options defined in the job script can be overridden by command line arguments to sbatch. For instance, according to the sbatch option table, -N controls the number of nodes to use for the job while -c is used to specify the number of tasks per node for the job.

Hence, the above job script, here called submit-MPI-job.sh, can be submitted in one of several ways, and, depending on how it was submitted, the job will have different characteristics

$ sbatch submit-MPI-job.sh                                  # 2 nodes; 16 tasks/node

$ sbatch -N 4 submit-MPI-job.sh                             # 4 nodes; 16 tasks/node

$ sbatch -c 10 submit-MPI-job.sh                            # 2 nodes; 10 tasks/node

$ sbatch --nodes=4 --ntasks-per-node=12 submit-MPI-job.sh   # 4 nodes; 12 cores/node

$ sbatch --nodes=4 -c 12 submit-MPI-job.sh                  # 4 nodes; 12 cores/node

$ sbatch -p skylake -N 6 -c 32 submit-MPI-job.sh            # 6 nodes; 32 cores/node
                                                            # skylake partition

In the last submission given above, we also change the partition for the job (using option -p).

Note

Changing partition often also requires an update of the value set by --ntasks-per-node (or -c) to reflect the number of physical cores present on the nodes involved

Warning

****** Add Common Features from MPI an OpenMP Jobs ********

Example of jobs scripts for different scenarios are given in the next section.

5.6.3. Interactive jobs

There are cases where you might want to interactively access compute nodes. For instance, this could be to run a software to feed it input during operation, or it could be to test or debug calculations interactively. Recall that you cannot simply ssh into a compute node unless you have a job running on it.

To obtain interactive access to a compute node you can use salloc. It will give you an interactive shell on the compute node that Slurm allocates, i.e. you will be directly logged onto this compute node and you can start working on it. To illustrate this consider

tux:~$ salloc
salloc: Granted job allocation 187
tux-5:~$

Notice that you here were allocated compute node tux-5 which is one of the nodes in the default partition normal.

To leave the compute node, you type exit, and your allocation is revoked and you are brought back to the login node

tux-5:~$ exit
exit
salloc: Relinquishing job allocation 187
tux:~$

Note

By default, salloc assumes the default partition normal and a rather short wall time``ADD TIME``. This behavior can be altered by using the options -p <partition> and -t DD-HH:MM:SS.

Running Graphics Applications

If you want to run a graphical application (e.g. a GUI) on a compute node, the above method will not work since X forwarding is not enabled. However, you can achieve this in a two step process; first you allocate a compute node as we just described (and, say, you were allocated node tux-5). Second, from another terminal on a login node you do (notice the option -X to ssh)

tux:~$ ssh -X tux-5
Linux tux-1 5.10.0-18-amd64 #1 SMP Debian 5.10.140-1 (2022-09-02) x86_64

Last login: Fri Oct 28 20:14:10 2022 from 192.168.7.105

tux-5:~$

After the ssh login you should now be able to run graphical applications. To end your allocation, first log out from your ssh connection then exit the salloc connection.

5.7. Job Script Examples

The differences between the various types of Slurm batch jobs are found in the job scripts that define them. Below we will present examples of job scripts for the following scenarios:

  • Batch jobs
    • Serial jobs

    • MPI jobs

    • OpenMP (multi-threaded)

    • Hybrid MPI/OpenMP

    • Embarrassingly parallel

    • GPU jobs

    • Run multiple processes in parallel

5.7.1. Serial Jobs

Serial jobs use only a single CPU-core. This is in contrast to parallel jobs which use multiple CPU-cores simultaneously.

Below is a sample Slurm script for submitting a serial job for running a python script (slurm-serial-job.sh):

#!/bin/bash

# --- Header
#SBATCH --job-name=Ex-Serial
#SBATCH --partition=sandybridge     # Partition to use
#SBATCH --time=0-01:00:00           # Time limit D-HH:MM:SS

#SBATCH --nodes=1                   # No. of nodes to use
#SBATCH --ntasks-per-node=16        # No. of physical cores per node
#SBATCH --cpus-per-task=1
#SBATCH --hint=nomultithread        # Do not use hyperthreading

#SBATCH --mem-per-cpu=1500MB        # Total memory / # cores

# --- Modules
ml purge
ml Python

# --- Environment variables

# --- Start the job
time python <my-script.py>

Here we request one node (--nodes=1) and one task (--cpus-per-task=1), as it should be for a serial job.

Warning

****** Can we use srun here as well? ***************

5.7.2. MPI Jobs

Message Passing Interface (MPI) is a communication protocol particularly used for performing distributed memory parallel jobs. This is achieved by splitting the job into many tasks, each with their own dedicated memory. One of the main advantages of MPI calculations, is how easily they can be scale up to run on many nodes.

The test job that we want to execute is Rayleigh2D -p input.nml -o output.h5, and it uses MPI. When hyperthreading is disables, one wants one task per core guaranteed by the option --cpus-per-task=1. Furthermore, the number of tasks per node is set equal to the number of physical cores on the nodes. For a node in the sandybridge partition this means setting the option --ntasks-per-node=16 since the nodes have 16 physical cores. The value of the option --nodes, defines how many nodes will be used in the calculation.

When these considerations are taken into account, an example Slurm job script that disables hyperthreading can be prepared, and it may look like (slurm-mpi-job.sh):

#!/bin/bash

# --- Header
#SBATCH --job-name=Ex-MPI
#SBATCH --partition=sandybridge     # Partition to use
#SBATCH --time=0-01:00:00           # Time limit D-HH:MM:SS

#SBATCH --nodes=2                   # No. of nodes to use
#SBATCH --ntasks-per-node=16        # No. of physical cores per node
#SBATCH --cpus-per-task=1
#SBATCH --hint=nomultithread        # Do not use hyperthreading

#SBATCH --mem-per-cpu=1500MB        # Total memory / # cores

# --- Modules
ml purge
ml Intel
ml hdf5

# --- Environment variables

# --- Start the job
time srun ./Rayleigh2D -p input.nml -o output.h5

Warning

******************** Should we add HyperThreading Info ******************** See link https://hpc-wiki.info/hpc/SLURM

5.7.3. OpenMP Jobs

OpenMP is a way to do parallel computations on shared memory machines. For a plane OpenMP multi-threaded job, this is achieved by running a single process (task) on a single node but letting numerous threads (cores) participating in solving this task. Therefore, one must set the options --nodes=1 and --ntasks-per-node=1 while --cpus-per-task is set to a number larger than one that is related to the number of physical cores on the node.

For instance, on a sandybridge node, having 16 physical cores, the following script can be used to submit a job where hyperthreading is disabled (slurm-openmp-job.sh):

#!/bin/bash

# --- Header
#SBATCH --job-name=Ex-OpenMP-NoHT
#SBATCH --partition=sandybridge     # Partition to use
#SBATCH --time=0-01:00:00           # Time limit D-HH:MM:SS

#SBATCH --nodes=1                   # No. of nodes to use
#SBATCH --ntasks-per-node=1         # No. of physical cores per node
#SBATCH --cpus-per-task=16          #
#SBATCH --hint=nomultithread        # Do not use hyperthreading

#SBATCH --mem-per-cpu=1500MB        # Total memory / # cores


# --- Modules
ml purge
ml scuff-em

# --- Environment variables

# --- Start the job
time srun scuff-scatter < Args

Hyperthreading is enabled by the option --hint=multithread given that it is also activated in the BIOS of the node which can be checked from sinfo -Nl <nodename>*. In this case, the number of cores is two times the number of physical cores on the node. Hence, if the

#!/bin/bash

# --- Header
#SBATCH --job-name=Ex-OpenMP-HT
#SBATCH --partition=sandybridge     # Partition to use
#SBATCH --time=0-01:00:00           # Time limit D-HH:MM:SS

#SBATCH --nodes=1                   # No. of nodes to use
#SBATCH --ntasks-per-node=1         # No. of physical cores per node
#SBATCH --cpus-per-task=32          #
#SBATCH --hint=multithread          # Use hyperthreading

#SBATCH --mem-per-cpu=1500MB        # Total memory / # cores


# --- Modules
ml purge
ml scuff-em

# --- Environment variables

# --- Start the job
time srun scuff-scatter < Args

At this point one may ask if hyperthreading is really helpful? It is hard to say something in general. It may, or it may not! You, the user, simply have to try it out for your application. For instance, for the simulations outlined by the above two job scripts, enabling hyperthreading resulted in a speed up of about 16%.

Warning

***** Remove this **************

## Taken from : https://nesi.github.io/hpc_training/lessons/maui-and-mahuika/slurm
##
##!/bin/bash
##SBATCH --job-name=JobName        # job name (shows up in the queue)
##SBATCH --account=nesi99999       # Project Account
##SBATCH --partition=normal        # specify a partition
##SBATCH --time=08:00:00           # Walltime (HH:MM:SS)
##
##SBATCH --mem-per-cpu=1500        # memory/cpu (in MB)
##SBATCH --nodes=1                 # number fo nodes
##SBATCH --ntasks=2                # number of tasks (e.g. MPI)
##SBATCH --cpus-per-task=4         # number of cores per task (e.g. OpenMP)
##SBATCH --hint=nomultithread      # don't use hyperthreading

5.7.4. Hybrid MPI/OpenMP Jobs

To be written!

5.7.5. Embarrassingly parallel jobs

To be written!

5.7.6. GPU Jobs

To be written!

5.7.7. Run multiple processes in parallel

On the Slurm scheduler, it is possible to use srun to natively run multiple processes in parallel and/or start a sequence of smaller jobs by one single back job. The method supports the execution of many small tasks in parallel, enabling HTC-style work-flows on HPC systems. This can be an alternative to the use of job arrays for running a large number of smaller tasks at once in a single job, or an alternative to GNU Parallel and Pylauncher.

Your Slurm script will contain multiple srun lines, and there are several key requirements for them to run simultaneously:

  1. Ensure that each srun command asks for a fraction of the CPU and memory resource of the full job, with lines that should run simultaneously requesting less than or equal to the job's total. Each task will start in order as soon as sufficient resources have become available for it.

  2. Include -c1 if using 1 CPU per task, which is standard.

  3. Include & at the end of each line to have the commands run simultaneously in the background.

  4. Include wait at the end of the sequence of srun commands to avoid having the job end while the processes are running in the background.

In this example, we assume a node of 24 cores and 168 GB of memory. We will have six total tasks to run and we want to run two at a time, each allocated half of the job's resources (12 cores and 84 GB of memory). The third task can start as soon as either of the first two ends, and so on.

The Slurm script for achieving this is

#!/bin/bash

# --- Header
#SBATCH -JSlurmParallelSrunExample  # Job name
#SBATCH --partition=sandybridge     # Partition to use
#SBATCH --time=0-01:00:00           # Time limit D-HH:MM:SS

#SBATCH --nodes=1                   # No. of nodes to use
#SBATCH --ntasks-per-node=24        # No. of physical cores per node
#SBATCH --mem-per-cpu=7G            # Memory per core

srun --quiet -n12 -c1 --mem=84G ./executable1 &
srun --quiet -n12 -c1 --mem=84G ./executable2 &
srun --quiet -n12 -c1 --mem=84G ./executable3 &
srun --quiet -n12 -c1 --mem=84G ./executable4 &
srun --quiet -n12 -c1 --mem=84G ./executable5 &
srun --quiet -n12 -c1 --mem=84G ./executable6 &
wait

The job scripts depend on the kind of job you want to run. Below we will give examples of job scripts for the following type of jobs

  • embarrassingly parallel

  • OpenMP (multi-threaded)

  • MPI

  • hybrid MPI/OpenMP

  • GPU

In this document we discuss several job types and use cases. In most cases, a compute job falls under one (or more than one) of the following categories:

embarrassingly parallel OpenMP (multi-threaded) MPI hybrid MPI/OpenMP GPU

5.8. Managing jobs in Slurm

Subsections to cover:

  • Monitoring running Jobs

  • Stopping or cancelling Jobs

  • Investigating finished Jobs

  • Debugging failed Jobs

5.9. SLURM Environment variables

When a job submitted with sbatch starts, numerous Slurm environment variables will be set. A few commonly used variables:

Variable

Description

SLURM_JOB_ID

Useful for naming output files that won't clash.

SLURM_JOB_NAME

Name of the job.

SLURM_SUBMIT_DIR

Directory where sbatch was called.

SLURM_ARRAY_TASK_ID

The current index of your array job.

SLURM_CPUS_PER_TASK

Useful as an input for multi-threaded functions.

SLURM_NTASKS

Useful as an input for MPI functions.

Common SLURM Environment Variables

Variable

Description

$SLURM_JOB_ID

The Job ID.

$SLURM_JOBID

Deprecated. Same as $SLURM_JOB_ID

$SLURM_SUBMIT_DIR

The path of the job submission directory.

$SLURM_SUBMIT_HOST

The hostname of the node used for job submission.

$SLURM_JOB_NODELIST

Contains the definition (list) of the nodes that is assigned to the job.

$SLURM_NODELIST

Deprecated. Same as SLURM_JOB_NODELIST.

$SLURM_CPUS_PER_TASK

Number of CPUs per task.

$SLURM_CPUS_ON_NODE

Number of CPUs on the allocated node.

$SLURM_JOB_CPUS_PER_NODE

Count of processors available to the job on this node.

$SLURM_CPUS_PER_GPU

Number of CPUs requested per allocated GPU.

$SLURM_MEM_PER_CPU

Memory per CPU. Same as --mem-per-cpu .

$SLURM_MEM_PER_GPU

Memory per GPU.

$SLURM_MEM_PER_NODE

Memory per node. Same as --mem .

$SLURM_GPUS

Number of GPUs requested.

$SLURM_NTASKS

Same as -n, --ntasks. The number of tasks.

$SLURM_NTASKS_PER_NODE

Number of tasks requested per node.

$SLURM_NTASKS_PER_SOCKET

Number of tasks requested per socket.

$SLURM_NTASKS_PER_CORE

Number of tasks requested per core.

$SLURM_NTASKS_PER_GPU

Number of tasks requested per GPU.

$SLURM_NPROCS

Same as -n, --ntasks. See $SLURM_NTASKS.

$SLURM_NNODES

Total number of nodes in the job's resource allocation.

$SLURM_TASKS_PER_NODE

Number of tasks to be initiated on each node.

$SLURM_ARRAY_JOB_ID

Job array's master job ID number.

$SLURM_ARRAY_TASK_ID

Job array ID (index) number.

$SLURM_ARRAY_TASK_COUNT

Total number of tasks in a job array.

$SLURM_ARRAY_TASK_MAX

Job array's maximum ID (index) number.

$SLURM_ARRAY_TASK_MIN

Job array's minimum ID (index) number.

A full list of environment variables for SLURM can be found by visiting the SLURM page on environment variables.

A full list of environment variables for SLURM can be found by visiting the SLURM page on environment variables.

Note

In order to decrease the chance of a variable being misinterpreted you should use the syntax ${NAME_OF_VARIABLE} and define in strings if possible. e.g.

echo "Completed task ${SLURM_ARRAY_TASK_ID} / ${SLURM_ARRAY_TASK_COUNT} successfully"

5.10. Additional information

External sources for additional information are: