Early User Info

This page lists the new versions of software and new features available as Delta transitions to RedHat 9 OS stack in the fall of 2025.

What is not Changing?

These should function as usual with no noticable changes:

  • job scheduler: no change to partition names

  • filesystems: no changes to /projects and /work. See the IME section regarding /ime going away.

  • containers: apptainer with existing containers

What is Changing?

OS Update

The operating system was updated from RedHat (RH) 8.8 to 9.4. The table below contains the change in OS kernel, glibc and OS version of the GCC compiler.

OS

Linux Kernel

glibc

OS GCC

Old

RH 8.8

4.18.0

2.28

8.5.0

New

RH 9.4

5.14.0

2.34

11.4.1

NVIDIA Driver and CUDA

The NVIDIA driver and base CUDA is updated to provide more recent (but not the latest) release as shown in the following table:

NVIDIA Driver

Base CUDA

Old

550.163.01

12.4

New

570.148.08

12.8

Note that CUDA 11.8 will also be available from the new programming environment but is not yet ready for testing.

Default Programming Environment

There are larger changes to the default programming environment compared to the current production environment on Delta.

New Programming Environment (PE)

The compiler, MPI implementation and other base packages will be provided by the Cray Programming Environment (CrayPE), similar to how the default programming environment is provided on DeltaAI.

The default environment will be based on the GNU CrayPE PrgEnv-gnu. The default MPI implementation will be Cray’s MPICH.

GCC compiler

Module names

CUDA

MPI

Module name

Old PE

gcc

gcc/11.4.0

cuda/11.8.0

OpenMPI

openmpi

New PE

gcc

PrgEnv-gnu

gcc-native/13.2

cudatoolkit/25.3_12.8

Cray MPICH

cray-mpich

Other Cray PEs are available such as PrgEnv-nvidia (NVIDIA HPC SDK compilers) and PrgEnv-cray (Cray compilers). All programming environments are set by default to the the cray-mpich module.

Use the modules command to view the default loaded modules

[gbauer@dt-testlogin01 ~]$ module list

Currently Loaded Modules:
  1) gcc-native/13.2      6) cray-libsci/25.03.0    11) craype-accel-nvidia80
  2) craype/2.7.34        7) PrgEnv-gnu/8.6.0       12) cue-login-env/1.1
  3) libfabric/1.22.0     8) cray-dsmml/0.3.1       13) slurm-env/0.1
  4) craype-network-ofi   9) craype-x86-milan       14) default
  5) cray-mpich/8.1.32   10) cudatoolkit/25.3_12.8

Use of Compiler Wrappers

The CrayPE compiler wrappers cc, CC and ftn are recommended when building C, C++ and Fortran libraries and applications. The wrappers automatically include paths to include files and libraries for MPI and GPU RDMA and CUDA.

The CrayPE provides MPI compiler wrappers mpicc, mpicxx/mpic++ and mpifort/mpif77/mpif90 can be used with CPU MPI codes but require some additional include path and libraries when compiling GPU aware libraries for GPU RDMA as mentioned in the mpi man page (see: man mpi).

The following environment variables have been set to help use the compiler wrappers:

Environment Variable

Default Setting

CC

cc

CXX

CC

FC

ftn

MPICC

mpicc

MPICXX

mpicxx

MPIF77

mpif77

MPIF90

mpif90

CMAKE_C_COMPILER

cc

CMAKE_CXX_COMPILER

CC

CMAKE_Fortran_COMPILER

ftn

Support for GPU RDMA

The Cray Programming Environments: PrgEnv-gnu, PrgEnv-nvidia and PrgEnv-cray support GPU RDMA. Compiler and runtime support is configured by default for PrgEnv-gnu and PrgEnv-nvidia.

To enable support for GPU RDMA the environment variable MPICH_GPU_SUPPORT_ENABLED needs to be set

export MPICH_GPU_SUPPORT_ENABLED=1

If you see

aborting job:
MPIDI_CRAY_init: GPU_SUPPORT_ENABLED is requested, but GTL library is not linked

then you set the environment variable but did not properly link the executable with -lmpi_gtl_cuda or use the cc, CC or ftn compiler wrappers.

NCCL

Please load the aws-ofi-nccl module so that NCCL will use the appropriate high-speed network provider for NCCL.

This module provides the AWS OFI network transport plugin for NCCL,
optimized for Cray systems with CXI interconnect.

Dependencies (must be loaded first):
- cudatoolkit/25.3_12.8
- libfabric/1.22.0

Python

Several python packages are available for use:

miniforge3-python
pytorch-conda/2.8
tensorflow-conda/2.18

Use the module spider command to find packages with modules.

When installing Python packages, especially mpi4py with GPU support we recommend setting the MPICC environment variable as follows:

for GPU Python

MPICC="cc -shared" pip install mpi4py

Open OnDemand

The OnDemand instance for Jupyter and the Desktop applications is in internal testing. We will make an announcement once it is available.

What is going away?

IME

The /ime caching front-end to /work will not be available on Delta after the upgrade to RH9. Please use /work/hdd or request space on /work/nvme if you have a use case that required use of /ime.

OpenMPI

At the moment only the cray-mpich module is supported. OpenMPI performance is less than one-half of what we see with the cray-mpich implementation, so priority to redeploy OpenMPI is reduced.

How to Access

Login Node Access

Please use the following two login nodes for access to the new configuration.

Login nodes

dt-login04.delta.ncsa.illinois.edu

How to Run Jobs

You must be logged into one of the test login nodes in order to run jobs with the new configuration.

There are compute nodes booted with the new configuration and a default Slurm reservation called RH9 has been added to your environment to direct jobs to those nodes.

At the moment there are 1/4 of the CPU nodes, 1/4 of the A100 nodes and 1/4 of the A40 nodes available for use.

To list the nodes available in the reservation:

[gbauer@dt-login04 ~]$ sinfo --long | grep $SLURM_RESERVATION
PARTITION              AVAIL  TIMELIMIT   JOB_SIZE ROOT OVERSUBS     GROUPS  NODES       STATE RESERVATION NODELIST
cpu                       up 2-00:00:00 1-infinite   no       NO        all      2    reserved         RH9 cn[001,027]
cpu-interactive           up    1:00:00        1-4   no       NO        all      2    reserved         RH9 cn[001,027]
cpu-preempt               up 2-00:00:00 1-infinite   no       NO        all      2    reserved         RH9 cn[001,027]
full                      up 1-00:00:00 1-infinite   no       NO        all      6    reserved         RH9 cn[001,027],gpua[007,010],gpub[054,075]
gpuA100x4*                up 2-00:00:00 1-infinite   no       NO        all      2    reserved         RH9 gpua[007,010]
gpuA100x4-interactive     up    1:00:00        1-4   no       NO        all      2    reserved         RH9 gpua[007,010]
gpuA100x4-preempt         up 2-00:00:00 1-infinite   no       NO        all      2    reserved         RH9 gpua[007,010]
gpuA40x4                  up 2-00:00:00 1-infinite   no       NO        all      2    reserved         RH9 gpub[054,075]
gpuA40x4-interactive      up    1:00:00        1-4   no       NO        all      2    reserved         RH9 gpub[054,075]
gpuA40x4-preempt          up 2-00:00:00 1-infinite   no       NO        all      2    reserved         RH9 gpub[054,075]