Skip to content

Support Documentation

Models

Models

To avoid unnecessary storage use, we maintain readonly versions of popular models in /opt/nesi/models, if you can use this please do. If you need a model that is not listed here, please Contact our Support Team with the model name, source, and a brief description of your use case.

Available models¶

Model	Licence	Path	Slurm
Llama 3.1	Meta Llama 3.1	`/opt/nesi/models/gguf/llama3.1/llama3.1-8b.gguf`	`#SBATCH --gpus-per-node=l4:1`
Llama 3.1	Meta Llama 3.1	`/opt/nesi/model/gguf/llama3.1/llama3.1-70b.gguf`	`#SBATCH --partition=milan #SBATCH --gpus-per-node=a100:1`
DeepSeek-R1	MIT	`/opt/nesi/model/gguf/deepseek-r1/deepseek-r1-7b.gguf`	`#SBATCH --gpus-per-node=l4:1`
		`/opt/nesi/model/gguf/deepseek-r1/deepseek-r1-32b.gguf`	`#SBATCH --partition=genoa #SBATCH --gpus-per-node=a100:1`
		`/opt/nesi/model/gguf/deepseek-r1/deepseek-r1-70b.gguf`	`#SBATCH --partition=milan #SBATCH --gpus-per-node=a100:1`
Qwen3	Apache 2.0	`/opt/nesi/model/gguf/qwen3/qwen3-14b.gguf`	`#SBATCH --gpus-per-node=l4:1`
Qwen3	Apache 2.0	`/opt/nesi/model/gguf/qwen3/qwen3-32b.gguf`	`#SBATCH --partition=genoa #SBATCH --gpus-per-node=a100:1`
Qwen2.5	Apache 2.0	`/opt/nesi/model/gguf/qwen2.5/qwen2.5-7b.gguf`	`#SBATCH --gpus-per-node=l4:1`
Qwen2.5	Apache 2.0	`/opt/nesi/model/gguf/qwen2.5/qwen2.5-14b.gguf`	`#SBATCH --gpus-per-node=l4:1`
Gemma 3	Gemma	`/opt/nesi/model/gguf/gemma3/gemma3-27b.gguf`	`#SBATCH --partition=genoa #SBATCH --gpus-per-node=a100:1`

The Slurm column shows the minimum GPU flags required, your actual throughput will depend on the queue size. See Hardware for a full list of available GPUs.

L4 GPUs have no double-precision floating point

The L4 is an inference-optimised GPU. It is suitable for running quantised models but should not be used for model training or workflows that require FP64 precision.

- [Ollama](../Software/Available_Applications/ollama.md).
- [Hardware](../Batch_Computing/Hardware.md).