Skip to content
Contact Support

GPU use on NeSI

This page provides generic information about how to access NeSI's GPU cards.

For application specific settings (e.g. OpenMP, Tensorflow on GPU, ...), please have a look at the dedicated pages listed at the end of this page.


An overview of available GPU cards is available in the Available GPUs on NeSI support page. Details about GPU cards for each system and usage limits are in the Mahuika Slurm Partitions and Māui_Ancil (CS500) Slurm Partitions support pages. Details about pricing in terms of compute units can be found in the What is an allocation? page.


Recall, memory associated with the GPUs is the VRAM, and is a separate resource from the RAM requested by Slurm. The memory values listed below are VRAM values. For available RAM on the GPU nodes, please see Mahuika Slurm Partitions.

Request GPU resources using Slurm

To request a GPU for your Slurm job, add the following option at the beginning of your submission script:

#SBATCH --gpus-per-node=1

You can specify the type and number of GPU you need using the following syntax

#SBATCH --gpus-per-node=<gpu_type>:<gpu_number>

If not specified, the default GPU type is P100. For some types of GPU, you also need to specify a partition. Here is a summary of typical use cases:

  • 1 P100 GPU on Mahuika

    #SBATCH --gpus-per-node=P100:1
  • 1 P100 GPU on Māui Ancillary Nodes

    #SBATCH --partition=nesi_gpu
    #SBATCH --gpus-per-node=1
  • 2 P100 GPUs per node on Mahuika

    #SBATCH --gpus-per-node=P100:2

    You cannot ask for more than 2 P100 GPU per node on Mahuika.

  • 1 A100 (40GB) GPU on Mahuika

    #SBATCH --gpus-per-node=A100:1
  • 2 A100 (40GB) GPUs on Mahuika

    #SBATCH --gpus-per-node=A100:2

    You cannot ask for more than 2 A100 (40GB) GPUs per node on Mahuika.

  • 1 A100-1g.5gb GPU on Mahuika

    #SBATCH --gpus-per-node=A100-1g.5gb:1

    This type of GPU is limited to 1 job per user and recommended for development and debugging.

  • 1 A100 (80GB) GPU on Mahuika

    #SBATCH --partition=hgx
    #SBATCH --gpus-per-node=A100:1

    These GPUs are on Milan nodes, check the dedicated support page for more information.

  • 4 A100 (80GB & NVLink) GPU on Mahuika

    #SBATCH --partition=hgx
    #SBATCH --gpus-per-node=A100:4

    These GPUs are on Milan nodes, check the dedicated support page for more information.

    You cannot ask for more than 4 A100 (80GB) GPUs per node on Mahuika.

  • 1 A100 GPU on Mahuika, regardless of the type

    #SBATCH --partition=gpu,hgx
    #SBATCH --gpus-per-node=A100:1

    With this configuration, your job will spend less time in the queue, using whichever A100 GPU is available. It may land on a regular Mahuika node (A100 40GB GPU) or on a Milan node (A100 80GB GPU).

You can also use the --gpus-per-nodeoption in Slurm interactive sessions, with the srun and salloc commands. For example:

srun --job-name "InteractiveGPU" --gpus-per-node 1 --cpus-per-task 8 --mem 2GB --time 00:30:00 --pty bash

will request and then start a bash session with access to a GPU, for a duration of 30 minutes.


When you use the --gpus-per-nodeoption, Slurm automatically sets the CUDA_VISIBLE_DEVICES environment variable inside your job environment to list the index/es of the allocated GPU card/s on each node.

srun --job-name "GPUTest" --gpus-per-node=P100:2 --time 00:05:00 --pty bash
srun: job 20015016 queued and waiting for resources
srun: job 20015016 has been allocated resources

Load CUDA and cuDNN modules

To use an Nvidia GPU card with your application, you need to load the driver and the CUDA toolkit via the environment modules mechanism:

module load CUDA/11.0.2

You can list the available versions using:

module spider CUDA

PleaseContact our Support Team if you need a version not available on the platform.


On Māui Ancillary Nodes, use module avail CUDA to list available versions.

The CUDA module also provides access to additional command line tools:

  • nvidia-smi to directly monitor GPU resource utilisation,
  • nvcc to compile CUDA programs,
  • cuda-gdb to debug CUDA applications.

In addition, the cuDNN (NVIDIA CUDA® Deep Neural Network library) library is accessible via its dedicated module:

module load cuDNN/

which will automatically load the related CUDA version. Available versions can be listed using:

module spider cuDNN

Example Slurm script

The following Slurm script illustrates a minimal example to request a GPU card, load the CUDA toolkit and query some information about the GPU:

#!/bin/bash -e
#SBATCH --job-name=GPUJob   # job name (shows up in the queue)
#SBATCH --time=00-00:10:00  # Walltime (DD-HH:MM:SS)
#SBATCH --gpus-per-node=1   # GPU resources required per node
#SBATCH --cpus-per-task=2   # number of CPUs per task (1 by default)
#SBATCH --mem=512MB         # amount of memory per node (1 by default)

# load CUDA module
module purge
module load CUDA/11.0.2

# display information about the available GPUs

# check the value of the CUDA_VISIBLE_DEVICES variable

Save this in a file and submit it using:


The content of job output file would look like:

$ cat slurm-20016124.out

The following modules were not unloaded:
   (Use "module --force purge" to unload all):

  1) slurm   2) NeSI
Wed May 12 12:08:27 2021
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla P100-PCIE...  On   | 00000000:05:00.0 Off |                    0 |
| N/A   29C    P0    23W / 250W |      0MiB / 12198MiB |      0%      Default |
|                               |                      |                  N/A |

| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|  No running processes found                                                 |


CUDA_VISIBLE_DEVICES=0 indicates that this job was allocated to CUDA GPU index 0 on this node. It is not a count of allocated GPUs.

NVIDIA Nsight Systems and Compute profilers

Nsight Systems is a system-wide analysis tool, particularly good for profiling CPU-GPU interactions. It is provided on Mahuika via the Nsight-Systems module:

module load Nsight-Systems/2020.5.1
Load `PyQt/5.12.1-gimkl-2020a-Python-3.8.2` module prior to running `nsys-ui`
nsys --version
NVIDIA Nsight Systems version 2020.5.1.85-5ee086b

This module gives you access to the nsys command line tool or the nsys-ui graphical interface.

Nsight Compute is a profiler for CUDA kernels. It is accessible on Mahuika using the Nsight-Compute module:

module load Nsight-Compute/2020.3.0
Load `PyQt/5.12.1-gimkl-2020a-Python-3.8.2` module prior to running `nsys-ui`
ncu --version
NVIDIA (R) Nsight Compute Command Line Profiler
Copyright (c) 2018-2020 NVIDIA Corporation
Version 2020.3.0.0 (build 29307467) (public-release)

Then you can use the ncu command line tool or the ncu-ui graphical interface.


The nsys-ui and ncu-ui tools require access to a display server, either via X11 or a Virtual Desktop. You also need to load the PyQt module beforehand:

module load PyQt/5.12.1-gimkl-2020a-Python-3.8.2
module load Nsight-Systems/2020.5.1
nsys-ui  # this will work only if you have a graphical session

Application and toolbox specific support pages

The following pages provide additional information for supported applications:

And programming toolkits: