Containers
Apptainer (formerly Singularity)
Container support on DeltaAI is provided by Apptainer.
Docker images can be converted to Apptainer sif format via the apptainer pull command. Commands can be run from within a container using the apptainer run command.
If you encounter $HOME quota issues with Apptainer caching in ~/.apptainer, the environment variable APPTAINER_CACHEDIR can be used to select a different location such as a directory under /scratch. See also, apptainer build environment
Your $HOME is automatically available from containers run via Apptainer.
You can pip3 install --user against a container’s python (use PYTHONUSERBASE with this option), setup virtual environments, or similar while using a containerized application.
Just run the container’s /bin/bash (or use apptainer shell <container> for a quick look from a login node).
Below is an srun example of that with TensorFlow:
arnoldg@gh002:~> cat ~/bin/salloc.sh
#!/bin/bash
salloc \
--mem=64g \
--nodes=1 \
--ntasks-per-node=1 \
--cpus-per-task=8 \
--partition=ghx4 \
--time=00:50:00 \
--job-name=generic \
--gpus-per-node=1
arnoldg@gh002:~> ~/bin/salloc.sh
salloc: Granted job allocation 50737
salloc: Nodes gh076 are ready for job
arnoldg@gh-login02:~/apptainer/NGC/tensorflow> srun apptainer run --nv --bind /projects /sw/user/NGC_containers/tensorflow_24.09-tf2-py3.sif /bin/bash --
login
================
== TensorFlow ==
================
NVIDIA Release 24.09-tf2 (build 109642209)
TensorFlow Version 2.16.1
Container image Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Copyright 2017-2024 The TensorFlow Authors. All rights reserved.
Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES. All rights reserved.
This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
NOTE: CUDA Forward Compatibility mode ENABLED.
Using CUDA 12.6 driver version 560.35.03 with kernel driver version 535.183.06.
See https://docs.nvidia.com/deploy/cuda-compatibility/ for details.
python --version
Python 3.10.12
exit
NVIDIA NGC Containers
DeltaAI provides NVIDIA NGC Docker containers that are pre-built with Apptainer. Look for the latest binary containers in /sw/external/NGC/.
The containers are used as shown in the sample scripts below:
PyTorch Example Script
#!/bin/bash
#SBATCH --mem=64g
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=4
#SBATCH --partition=test
#SBATCH --time=00:20:00
#SBATCH --job-name=pytorchNGC
### GPU options ###
#SBATCH --gpus-per-node=1
module list # job documentation and metadata
echo "job is starting on `hostname`"
# run the container binary with arguments: python3 <program.py>
time srun \
apptainer run --nv \
--bind /projects \
/sw/user/NGC_containers/pytorch_24.09-py3.sif python3 tensor_gpu.py
exit
TensorFlow Example Script
#!/bin/bash
#SBATCH --mem=80g
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=8
#SBATCH --partition=ghx4
#SBATCH --time=00:20:00
#SBATCH --job-name=tfngc
### GPU options ###
#SBATCH --gpus-per-node=1
module list # job documentation and metadata
echo "job is starting on `hostname`"
time apptainer run --nv \
--bind /projects \
/sw/user/NGC_containers/tensorflow_24.09-tf2-py3.sif python3 \
cifar10gpu.py
exit
cuQuantum Appliance Container
The cuQuantum Appliance is an NGC container with Cirq, qsim, and the cuQuantum SDK for GPU-accelerated quantum circuit simulation. The DeltaAI version includes Cray MPICH injection for multi-node MPI over Slingshot 11.
Component |
Version |
|---|---|
Cirq |
1.6.1 |
qsimcirq |
0.15.0 |
cuQuantum SDK |
25.11.0 |
cusvaer |
0.7.0 |
Python |
3.11 |
Example: Single-GPU Cirq/qsim job
#!/bin/bash
#SBATCH --account=<account_name>
#SBATCH --partition=ghx4
#SBATCH --nodes=1
#SBATCH --gpus=1
#SBATCH --time=00:20:00
#SBATCH --job-name=qsim-gpu
CONTAINER=/sw/user/NGC_containers/builds/cuquantum-appliance_25.11-arm64-cray-mpich.sif
apptainer exec --nv --bind /projects \
$CONTAINER bash -c '
conda activate cuquantum
python -u my_cirq_script.py
'
Note
The conda environment inside the container is named cuquantum (unversioned).
After conda activate, verify your Python path points inside the container.
Warning
The cuQuantum Appliance 25.11 image’s bundled cuquantum env has known
issues against DeltaAI’s current driver stack: import cirq fails on
Python 3.11 with TypeError: PynvmlFinder.find_spec() got an unexpected
keyword argument 'path' (the bundled pynvml 13.0.1 import hook is
incompatible with cirq’s delayed-import finder), and import qsimcirq
then fails with ImportError: libnvidia-ml.so.1: cannot open shared
object file even with apptainer --nv. Prefer the native
NVIDIA CUDA Quantum (CUDA-Q), pennylane, and NVIDIA cuQuantum SDK modules —
they are easier to use, kept up to date with the driver stack, and not
affected by these container-internal failures.
Container list
/sw/user/NGC_containers/builds/cuquantum-appliance_25.11-arm64-cray-mpich.sif /sw/user/NGC_containers/pytorch_24.08-py3-compat.sif
/sw/user/NGC_containers/gromacs_2023.2.sif /sw/user/NGC_containers/pytorch_24.08-py3.sif
/sw/user/NGC_containers/jax_23.10-py3.sif /sw/user/NGC_containers/pytorch_24.09-py3.sif
/sw/user/NGC_containers/jax_24.04-py3.sif /sw/user/NGC_containers/tensorflow_23.09-tf2-py3.sif
/sw/user/NGC_containers/lammps_patch_15Jun2023.sif /sw/user/NGC_containers/tensorflow_24.07-tf2-py3.sif
/sw/user/NGC_containers/namd_3.0-alpha3-singlenode.sif /sw/user/NGC_containers/tensorflow_24.08-tf2-py3.sif
/sw/user/NGC_containers/pytorch_24.07-py3.sif /sw/user/NGC_containers/tensorflow_24.09-tf2-py3.sif
See the NVIDIA containers catalog for more information.