Known Issues HPC3
Below is a list issues that we're actively working on. We hope to have these resolved soon. This is intended to be a temporary page.
For differences between the new platforms and Mahuika, see the more permanent differences from Mahuika.
Recently fixed
- All compute nodes now support the same large /tmp directories, ability to ssh into jobs, and fast access to the cluster filesystems.
Access¶
OnDemand (including Jupyter)¶
The resources dedicated to interactive work via a web browser are smaller, and so computations requiring large amounts of memory or many CPU cores are not yet supported.
Slurm jobs can be submitted, but only from the Clusters > NeSI HPC SHell Access
dropdown menu which opens a standard terminal window in the browser. Watch a demo here.
Login via ssh¶
Authentication¶
Currently, when logging into the new platform using a proxy you will be prompted for authentication twice, unless you have set up an ssh key.
MobaXterm¶
The session login process of MobaXterm is not compatible with Tuakiri 2-factor authentication.
ssh through a terminal will still be possible with MobaXterm, but it is recommended to use OnDemand for file browsing, file transfer (for files under 9.8 GB) and terminal access if you would normally have used MobaXterm. Watch a demo of how to use MobaXterm on the new platform.
Software¶
As was already the case on the Milan nodes in Mahuika (where they had a Rocky 8 OS), some of our environment modules cause system software to stop working, e.g: load module load Perl
and svn
stops working. This is usually the case if they load LegacySystem/7
as a dependency. The solutions are to ask us to re-build the problem environment module, or just don't have it loaded while doing other things.
QChem and any other software using node locked licences won't work on nodes which are not yet registered with that license.
Delft3D_FM wasn't working in Mahuika's Milan partition so probably needs rebuilding.
MPI software using 2020 or earlier toolchains eg intel-2020a, may not work correctly across nodes. Trying with more recent toolchains is recommended eg intel-2022a.
Please let us know if you find any additional problems.
Slurm¶
Compute nodes¶
None of the 3 Mahuika hugemem nodes are present yet, but the largest of the new Genoa nodes do have 1.5 TB of memory.
GPUs¶
If you request a GPU without specifying which type of GPU, you will get a random one. So please always specify a GPU type.
BadConstraints¶
This uninformative message can appear as a reason for a job pending in the squeue
output when the job is submitted to both milan
and genoa
partitions (which is the default behaviour). It does not appear to reflect a real problem though, just a side-effect of the mechanism we are using to target jobs to the right-sized node(s).
email¶
Slurm option --mail-type
is not yet effective.
Freezer Filesystems¶
If you have a large number files the s3cmd du
command will fail. If you wish to receive information from s3cmd du
we advise using a compression command such as tar
to reduce the total number of files before adding them to Freezer.