Programming Environment (Building Software)
C, C++, and Fortran: Traditional HPC
The DeltaAI programming environment is provided by HPE/Cray as the Cray User Environment . The Cray Compiler Environment describes the compilers: cc, CC, and ftn for building C, C++, and Fortran codes (serial, OpenMP, and/or MPI). Use the Cray compilers and the PrgEnv-<compiler_family> of your choice. There are three tested/functional programming environments (modules) on DeltaAI: PrgEnv-gnu, PrgEnv-cray, and prgenv-nvidia (note lowercase for nvidia). PrgEnv-gnu is loaded by default.
PrgEnv-gnu (or cray) |
openmp |
MPI or serial |
|---|---|---|
cc, CC, or ftn |
-fopenmp |
no special flags |
Compiler Recommendations
The NVIDIA Grace Hopper Tuning Guide has a section on Compilers and recommendations on flags and compiler vendors.
CUDA
The nvcc compiler for CUDA is available via the cudatoolkit module (loaded by default).
DeltaAI currently provides CUDA 12.9 through the NVIDIA HPC SDK 25.5.
$ module list cudatoolkit
Currently Loaded Modules Matching: cudatoolkit
1) cudatoolkit/25.5_12.9
$ which nvcc
/opt/nvidia/hpc_sdk/Linux_aarch64/25.5/compilers/bin/nvcc
$ nvcc --version
Cuda compilation tools, release 12.9, V12.9.41
Python
Note
More information about Python on DeltaAI is in Software - Python.
You always have your choice of python version, since it is easy to install yourself via a variety of methods (conda-forge, miniforge, and so on). There are basic python modules on DeltaAI provided by HPE/Cray: cray-python; these include mpi4py and numpy. There are also conda-forge builds with more software in them such as python/miniforge3_pytorch (also including mpi4py). For the latest information on DeltaAI python modules search using module spider for python and anaconda:
arnoldg@gh-login03:~> module spider python
----------------------------------------------------------------------------
cray-python:
----------------------------------------------------------------------------
Versions:
cray-python/3.11.5
cray-python/3.11.7
arnoldg@gh-login03:~> module spider miniforge
---------------------------------------------------------------------------------------------------------------------------------------
python/miniforge3_pytorch: python/miniforge3_pytorch/2.5.0
---------------------------------------------------------------------------------------------------------------------------------------
Note: to build your own mpi4py, follow this recipe (MPICC=”cc -shared” is required):
MPICC="cc -shared" pip install mpi4py
GPUDirect for MPI+CUDA
The default module environment on DeltaAI loads craype-accel-nvidia90 and
cudatoolkit; no additional module loads are needed.
To enable GPUDirect (GPU-aware MPI) with Cray MPICH, set the following
environment variable before running your application:
export MPICH_GPU_SUPPORT_ENABLED=1
Warning
Without MPICH_GPU_SUPPORT_ENABLED=1, MPI operations on GPU buffers will
segfault.
After building your code (with cc, CC, or ftn), verify that the
libmpi_gtl_cuda library is linked into your application. This library is
required for GPUDirect support. The module and environment settings should
match both at compile/link time and at runtime.
$ ldd osu_reduce | grep gtl
libmpi_gtl_cuda.so.0 => /opt/cray/pe/lib64/libmpi_gtl_cuda.so.0 (0x0000ffffa35d0000)
$ export MPICH_GPU_SUPPORT_ENABLED=1
$ srun --nodes=2 --ntasks-per-node=1 osu_reduce -d cuda --validation -m131072:131072
# OSU MPI-CUDA Reduce Latency Test v7.5
# Datatype: MPI_INT.
# Size Avg Latency(us) Validation
131072 50.72 Pass
NCCL
Multi-node GPU communication for deep learning frameworks (PyTorch, TensorFlow) uses NCCL (NVIDIA Collective Communication Library). On DeltaAI, the AWS OFI NCCL plugin bridges NCCL to the Slingshot 11 high-speed interconnect via libfabric’s CXI provider.
As of CPE 25.09, the nccl-ofi-plugin module is loaded by default:
$ module list nccl
Currently Loaded Modules Matching: nccl
1) nccl-ofi-plugin/1.18.0-cuda129
No additional module loads are needed for NCCL multi-node communication. The
plugin is automatically used when NCCL_DEBUG=INFO output shows
Selected provider is cxi.
To verify NCCL is using the Slingshot network in your job, set:
export NCCL_DEBUG=INFO
export NCCL_SOCKET_IFNAME=hsn
Look for these lines in the job output to confirm correct operation:
Selected provider is cxi
Loaded net plugin AWS Libfabric
See PyTorch Multi-Node for a complete multi-node training example.
Visual Studio Code
Warning
These VS Code pages are under construction.
The following pages provide step-by-step instructions on how to use VS Code, in different configurations, on DeltaAI.