Programming Environment (Building Software)

Important

In the summer/fall of 2025, Delta is transitioning from a RedHat 8 based OS stack to a RedHat 9 based OS stack. During the transition, some nodes will have the “old” RH8 stack, while some nodes will have the new RH9 stack.

There are two different versions of the documentation while this is going on. You are currently looking at pages for the new default RH9 stack. For documentation on the old RH8 stack, go to the Red Hat 8 page.

To see what’s new in Red Hat 9 and what you have to change to transition, please see the Early User Info page.

The Delta programming environment supports the GNU, AMD (AOCC), Cray, and NVIDIA HPC compilers.

Modules provide access to the compiler + MPI environment.

AMD recommended compiler flags for GNU, AOCC, and Intel compilers for Milan processors can be found in the AMD Compiler Options Quick Reference Guide for Epyc 7xx3 processors.

New Programming Environment (PE)

The compiler, MPI implementation and other base packages are provided by the Cray Programming Environment (CrayPE), similar to how the default programming environment is provided on DeltaAI.

The default environment will be based on the GNU CrayPE PrgEnv-gnu. The default MPI implementation will be Cray’s MPICH. There is no “mpirun” available on the system. Use srun/salloc instead.

GCC compiler

Module names

CUDA

MPI

Module name

New PE

gcc

PrgEnv-gnu

gcc-native/13.2

cudatoolkit/25.3_12.8

Cray MPICH

cray-mpich

Other Cray PEs are available such as PrgEnv-nvidia (NVIDIA HPC SDK compilers) and PrgEnv-cray (Cray compilers). All programming environments are set by default to the the cray-mpich module.

Cray xthi.c Sample Code

Document - XC Series User Application Placement Guide CLE6..0UP01 S-2496 | HPE Support

This code can be compiled using the methods show above. The code appears in some of the batch script examples below to demonstrate core placement options.

#define _GNU_SOURCE

#include <stdio.h>
#include <unistd.h>
#include <string.h>
#include <sched.h>
#include <mpi.h>
#include <omp.h>

/* Borrowed from util-linux-2.13-pre7/schedutils/taskset.c */
static char *cpuset_to_cstr(cpu_set_t *mask, char *str)
{
  char *ptr = str;
  int i, j, entry_made = 0;
  for (i = 0; i < CPU_SETSIZE; i++) {
    if (CPU_ISSET(i, mask)) {
      int run = 0;
      entry_made = 1;
      for (j = i + 1; j < CPU_SETSIZE; j++) {
        if (CPU_ISSET(j, mask)) run++;
        else break;
      }
      if (!run)
        sprintf(ptr, "%d,", i);
      else if (run == 1) {
        sprintf(ptr, "%d,%d,", i, i + 1);
        i++;
      } else {
        sprintf(ptr, "%d-%d,", i, i + run);
        i += run;
      }
      while (*ptr != 0) ptr++;
    }
  }
  ptr -= entry_made;
  *ptr = 0;
  return(str);
}

int main(int argc, char *argv[])
{
  int rank, thread;
  cpu_set_t coremask;
  char clbuf[7 * CPU_SETSIZE], hnbuf[64];

  MPI_Init(&argc, &argv);
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);
  memset(clbuf, 0, sizeof(clbuf));
  memset(hnbuf, 0, sizeof(hnbuf));
  (void)gethostname(hnbuf, sizeof(hnbuf));
  #pragma omp parallel private(thread, coremask, clbuf)
  {
    thread = omp_get_thread_num();
    (void)sched_getaffinity(0, sizeof(coremask), &coremask);
    cpuset_to_cstr(&coremask, clbuf);
    #pragma omp barrier
    printf("Hello from rank %d, thread %d, on %s. (core affinity = %s)\n",
            rank, thread, hnbuf, clbuf);
  }
  MPI_Finalize();
  return(0);
}

A version of xthi is also available from ORNL:

% git clone https://github.com/olcf/XC30-Training/blob/master/affinity/Xthi.c

CUDA

CUDA compilers (nvcc) are included in the cudatoolkit module which is loaded by default.

nv* commands when nvhpc is loaded

[arnoldg@dt-login03 namd]$ nv
nvaccelerror             nvidia-bug-report.sh     nvlink
nvaccelinfo              nvidia-cuda-mps-control  nv-nsight-cu
nvc                      nvidia-cuda-mps-server   nv-nsight-cu-cli
nvc++                    nvidia-debugdump         nvprepro
nvcc                     nvidia-modprobe          nvprof
nvcpuid                  nvidia-persistenced      nvprune
nvcudainit               nvidia-powerd            nvsize
nvdecode                 nvidia-settings          nvunzip
nvdisasm                 nvidia-sleep.sh          nvvp
nvextract                nvidia-smi               nvzip
nvfortran                nvidia-xconfig

See the NVIDIA HPC SDK page for more information.

The compute capability for A100 GPUs is 8.0, for A40 GPUs it is 8.6 and for H200 GPUs it is 9.0.

Note: The Multi-Process Service (MPS) is not enabled on Delta and there are no plans to support it in the future.

HIP/ROCm (AMD MI100)

Note

If using hipcc on the login nodes, add –offload-arch=gfx908 to the flags to match the gpu on the MI100 node.

To access the development environment for the gpuMI100x8 partition, start a job on the node with srun or sbatch.

Next, set your PATH to prefix /opt/rocm/bin where the HIP and ROCm tools are installed.

A sample batch script to obtain an xterm (interactive xterm batch script for Slurm) is shown below:

#!/bin/bash -x

MYACCOUNT=$1
GPUS=--gpus-per-node=1
PARTITION=gpuMI100x8-interactive
srun --tasks-per-node=1 --nodes=1 --cpus-per-task=4 \
  --mem=16g \
  --partition=$PARTITION \
  --time=00:30:00 \
  --account=account_name \    # <- match to a "Project" returned by the "accounts" command
  $GPUS --x11 \
  xterm

AMD HIP development environment on gpud01 (setting the path on the compute node):

[arnoldg@gpud01 bin]$ export PATH=/opt/rocm/bin:$PATH
[arnoldg@gpud01 bin]$ hipcc
No Arguments passed, exiting ...
[arnoldg@gpud01 bin]$

See the AMD HIP documentation and AMD ROCm documentation for more information.

Visual Studio Code

Note

The Code Server (VS Code) app in Open OnDemand provides an easy method to use VS Code in a web browser.

The following pages provide step-by-step instructions on how to use VS Code, in different configurations, on Delta.