Programming Environment (Building Software)

Warning

As Delta is upgraded to Red Hat 9 in the winter of 2025/26, we are transitioning the documentation. You have arrived at an old, stale documentation link. Please update your links to the main Delta documentation page.

The Delta programming environment supports the GNU, AMD (AOCC), Cray, and NVIDIA HPC compilers.

Modules provide access to the compiler + MPI environment.

AMD recommended compiler flags for GNU, AOCC, and Intel compilers for Milan processors can be found in the AMD Compiler Options Quick Reference Guide for Epyc 7xx3 processors.

New Programming Environment (PE)

The compiler, MPI implementation and other base packages are provided by the Cray Programming Environment (CrayPE), similar to how the default programming environment is provided on DeltaAI.

The default environment will be based on the GNU CrayPE PrgEnv-gnu. The default MPI implementation will be Cray’s MPICH. There is no “mpirun” available on the system. Use srun/salloc instead.

GCC compiler

Module names

CUDA

MPI

Module name

New PE

gcc

PrgEnv-gnu

gcc-native/13.2

cudatoolkit/25.3_12.8

Cray MPICH

cray-mpich

Other Cray PEs are available such as PrgEnv-nvidia (NVIDIA HPC SDK compilers) and PrgEnv-cray (Cray compilers). All programming environments are set by default to the the cray-mpich module.

Cray xthi.c Sample Code

Document - XC Series User Application Placement Guide CLE6..0UP01 S-2496 | HPE Support

This code can be compiled using the methods show above. The code appears in some of the batch script examples below to demonstrate core placement options.

#define _GNU_SOURCE

#include <stdio.h>
#include <unistd.h>
#include <string.h>
#include <sched.h>
#include <mpi.h>
#include <omp.h>

/* Borrowed from util-linux-2.13-pre7/schedutils/taskset.c */
static char *cpuset_to_cstr(cpu_set_t *mask, char *str)
{
  char *ptr = str;
  int i, j, entry_made = 0;
  for (i = 0; i < CPU_SETSIZE; i++) {
    if (CPU_ISSET(i, mask)) {
      int run = 0;
      entry_made = 1;
      for (j = i + 1; j < CPU_SETSIZE; j++) {
        if (CPU_ISSET(j, mask)) run++;
        else break;
      }
      if (!run)
        sprintf(ptr, "%d,", i);
      else if (run == 1) {
        sprintf(ptr, "%d,%d,", i, i + 1);
        i++;
      } else {
        sprintf(ptr, "%d-%d,", i, i + run);
        i += run;
      }
      while (*ptr != 0) ptr++;
    }
  }
  ptr -= entry_made;
  *ptr = 0;
  return(str);
}

int main(int argc, char *argv[])
{
  int rank, thread;
  cpu_set_t coremask;
  char clbuf[7 * CPU_SETSIZE], hnbuf[64];

  MPI_Init(&argc, &argv);
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);
  memset(clbuf, 0, sizeof(clbuf));
  memset(hnbuf, 0, sizeof(hnbuf));
  (void)gethostname(hnbuf, sizeof(hnbuf));
  #pragma omp parallel private(thread, coremask, clbuf)
  {
    thread = omp_get_thread_num();
    (void)sched_getaffinity(0, sizeof(coremask), &coremask);
    cpuset_to_cstr(&coremask, clbuf);
    #pragma omp barrier
    printf("Hello from rank %d, thread %d, on %s. (core affinity = %s)\n",
            rank, thread, hnbuf, clbuf);
  }
  MPI_Finalize();
  return(0);
}

A version of xthi is also available from ORNL:

% git clone https://github.com/olcf/XC30-Training/blob/master/affinity/Xthi.c

CUDA

CUDA compilers (nvcc) are included in the cudatoolkit module which is loaded by default.

nv* commands when nvhpc is loaded

[arnoldg@dt-login03 namd]$ nv
nvaccelerror             nvidia-bug-report.sh     nvlink
nvaccelinfo              nvidia-cuda-mps-control  nv-nsight-cu
nvc                      nvidia-cuda-mps-server   nv-nsight-cu-cli
nvc++                    nvidia-debugdump         nvprepro
nvcc                     nvidia-modprobe          nvprof
nvcpuid                  nvidia-persistenced      nvprune
nvcudainit               nvidia-powerd            nvsize
nvdecode                 nvidia-settings          nvunzip
nvdisasm                 nvidia-sleep.sh          nvvp
nvextract                nvidia-smi               nvzip
nvfortran                nvidia-xconfig

See the NVIDIA HPC SDK page for more information.

The compute capability for A100 GPUs is 8.0, for A40 GPUs it is 8.6 and for H200 GPUs it is 9.0.

Note: The Multi-Process Service (MPS) is not enabled on Delta and there are no plans to support it in the future.

HIP/ROCm (AMD MI100)

Note

If using hipcc on the login nodes, add –offload-arch=gfx908 to the flags to match the gpu on the MI100 node.

To access the development environment for the gpuMI100x8 partition, start a job on the node with srun or sbatch.

Next, set your PATH to prefix /opt/rocm/bin where the HIP and ROCm tools are installed.

A sample batch script to obtain an xterm (interactive xterm batch script for Slurm) is shown below:

#!/bin/bash -x

MYACCOUNT=$1
GPUS=--gpus-per-node=1
PARTITION=gpuMI100x8-interactive
srun --tasks-per-node=1 --nodes=1 --cpus-per-task=4 \
  --mem=16g \
  --partition=$PARTITION \
  --time=00:30:00 \
  --account=account_name \    # <- match to a "Project" returned by the "accounts" command
  $GPUS --x11 \
  xterm

AMD HIP development environment on gpud01 (setting the path on the compute node):

[arnoldg@gpud01 bin]$ export PATH=/opt/rocm/bin:$PATH
[arnoldg@gpud01 bin]$ hipcc
No Arguments passed, exiting ...
[arnoldg@gpud01 bin]$

See the AMD HIP documentation and AMD ROCm documentation for more information.

Visual Studio Code

Note

The Code Server (VS Code) app in Open OnDemand provides an easy method to use VS Code in a web browser.

The following pages provide step-by-step instructions on how to use VS Code, in different configurations, on Delta.