Programming Environment (Building Software)

C, C++, and Fortran: Traditional HPC

The DeltaAI programming environment is provided by HPE/Cray as the Cray User Environment . The Cray Compiler Environment describes the compilers: cc, CC, and ftn for building C, C++, and Fortran codes (serial, OpenMP, and/or MPI). Use the Cray compilers and the PrgEnv-<compiler_family> of your choice. There are three tested/functional programming environments (modules) on DeltaAI: PrgEnv-gnu, PrgEnv-cray, and prgenv-nvidia (note lowercase for nvidia). PrgEnv-gnu is loaded by default.

typical flags

PrgEnv-gnu (or cray)

openmp

MPI or serial

cc, CC, or ftn

-fopenmp

no special flags

Compiler Recommendations

The NVIDIA Grace Hopper Tuning Guide has a section on Compilers and recommendations on flags and compiler vendors.

CUDA

The nvcc compiler for CUDA is available via the cudatoolkit module (loaded by default). DeltaAI currently provides CUDA 12.9 through the NVIDIA HPC SDK 25.5.

$ module list cudatoolkit
Currently Loaded Modules Matching: cudatoolkit
  1) cudatoolkit/25.5_12.9

$ which nvcc
/opt/nvidia/hpc_sdk/Linux_aarch64/25.5/compilers/bin/nvcc
$ nvcc --version
Cuda compilation tools, release 12.9, V12.9.41

Python

Note

More information about Python on DeltaAI is in Software - Python.

You always have your choice of python version, since it is easy to install yourself via a variety of methods (conda-forge, miniforge, and so on). There are basic python modules on DeltaAI provided by HPE/Cray: cray-python; these include mpi4py and numpy. There are also conda-forge builds with more software in them such as python/miniforge3_pytorch (also including mpi4py). For the latest information on DeltaAI python modules search using module spider for python and anaconda:

arnoldg@gh-login03:~> module spider python
----------------------------------------------------------------------------
  cray-python:
----------------------------------------------------------------------------
     Versions:
        cray-python/3.11.5
        cray-python/3.11.7

arnoldg@gh-login03:~> module spider miniforge

---------------------------------------------------------------------------------------------------------------------------------------
  python/miniforge3_pytorch: python/miniforge3_pytorch/2.5.0
---------------------------------------------------------------------------------------------------------------------------------------

Note: to build your own mpi4py, follow this recipe (MPICC=”cc -shared” is required):

MPICC="cc -shared" pip install mpi4py

GPUDirect for MPI+CUDA

The default module environment on DeltaAI loads craype-accel-nvidia90 and cudatoolkit; no additional module loads are needed. To enable GPUDirect (GPU-aware MPI) with Cray MPICH, set the following environment variable before running your application:

export MPICH_GPU_SUPPORT_ENABLED=1

Warning

Without MPICH_GPU_SUPPORT_ENABLED=1, MPI operations on GPU buffers will segfault.

After building your code (with cc, CC, or ftn), verify that the libmpi_gtl_cuda library is linked into your application. This library is required for GPUDirect support. The module and environment settings should match both at compile/link time and at runtime.

$ ldd osu_reduce | grep gtl
      libmpi_gtl_cuda.so.0 => /opt/cray/pe/lib64/libmpi_gtl_cuda.so.0 (0x0000ffffa35d0000)
$ export MPICH_GPU_SUPPORT_ENABLED=1
$ srun --nodes=2 --ntasks-per-node=1 osu_reduce -d cuda --validation -m131072:131072

# OSU MPI-CUDA Reduce Latency Test v7.5
# Datatype: MPI_INT.
# Size       Avg Latency(us)        Validation
131072                 50.72              Pass

NCCL

Multi-node GPU communication for deep learning frameworks (PyTorch, TensorFlow) uses NCCL (NVIDIA Collective Communication Library). On DeltaAI, the AWS OFI NCCL plugin bridges NCCL to the Slingshot 11 high-speed interconnect via libfabric’s CXI provider.

As of CPE 25.09, the nccl-ofi-plugin module is loaded by default:

$ module list nccl
Currently Loaded Modules Matching: nccl
  1) nccl-ofi-plugin/1.18.0-cuda129

No additional module loads are needed for NCCL multi-node communication. The plugin is automatically used when NCCL_DEBUG=INFO output shows Selected provider is cxi.

To verify NCCL is using the Slingshot network in your job, set:

export NCCL_DEBUG=INFO
export NCCL_SOCKET_IFNAME=hsn

Look for these lines in the job output to confirm correct operation:

Selected provider is cxi
Loaded net plugin AWS Libfabric

See PyTorch Multi-Node for a complete multi-node training example.

Visual Studio Code

Warning

These VS Code pages are under construction.

The following pages provide step-by-step instructions on how to use VS Code, in different configurations, on DeltaAI.