Containers
Apptainer (formerly Singularity)
Container support on DeltaAI is provided by Apptainer.
Docker images can be converted to Apptainer sif format via the apptainer pull command. Commands can be run from within a container using the apptainer run command.
If you encounter $HOME
quota issues with Apptainer caching in ~/.apptainer
, the environment variable APPTAINER_CACHEDIR
can be used to select a different location such as a directory under /scratch
. See also, apptainer build environment
Your $HOME
is automatically available from containers run via Apptainer.
You can pip3 install --user
against a container’s python (use PYTHONUSERBASE
with this option), setup virtual environments, or similar while using a containerized application.
Just run the container’s /bin/bash
(or use apptainer shell <container>
for a quick look from a login node).
Below is an srun
example of that with TensorFlow:
arnoldg@gh002:~> cat ~/bin/salloc.sh
#!/bin/bash
salloc \
--mem=64g \
--nodes=1 \
--ntasks-per-node=1 \
--cpus-per-task=8 \
--partition=ghx4 \
--time=00:50:00 \
--job-name=generic \
--gpus-per-node=1
arnoldg@gh002:~> ~/bin/salloc.sh
salloc: Granted job allocation 50737
salloc: Nodes gh076 are ready for job
arnoldg@gh-login02:~/apptainer/NGC/tensorflow> srun apptainer run --nv --bind /projects /sw/user/NGC_containers/tensorflow_24.09-tf2-py3.sif /bin/bash --
login
================
== TensorFlow ==
================
NVIDIA Release 24.09-tf2 (build 109642209)
TensorFlow Version 2.16.1
Container image Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Copyright 2017-2024 The TensorFlow Authors. All rights reserved.
Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES. All rights reserved.
This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
NOTE: CUDA Forward Compatibility mode ENABLED.
Using CUDA 12.6 driver version 560.35.03 with kernel driver version 535.183.06.
See https://docs.nvidia.com/deploy/cuda-compatibility/ for details.
python --version
Python 3.10.12
exit
NVIDIA NGC Containers
DeltaAI provides NVIDIA NGC Docker containers that are pre-built with Apptainer. Look for the latest binary containers in /sw/external/NGC/
.
The containers are used as shown in the sample scripts below:
PyTorch Example Script
#!/bin/bash
#SBATCH --mem=64g
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=4
#SBATCH --partition=test
#SBATCH --time=00:20:00
#SBATCH --job-name=pytorchNGC
### GPU options ###
#SBATCH --gpus-per-node=1
module list # job documentation and metadata
echo "job is starting on `hostname`"
# run the container binary with arguments: python3 <program.py>
time srun \
apptainer run --nv \
--bind /projects \
/sw/user/NGC_containers/pytorch_24.09-py3.sif python3 tensor_gpu.py
exit
TensorFlow Example Script
#!/bin/bash
#SBATCH --mem=80g
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=8
#SBATCH --partition=ghx4
#SBATCH --time=00:20:00
#SBATCH --job-name=tfngc
### GPU options ###
#SBATCH --gpus-per-node=1
module list # job documentation and metadata
echo "job is starting on `hostname`"
time apptainer run --nv \
--bind /projects \
/sw/user/NGC_containers/tensorflow_24.09-tf2-py3.sif python3 \
cifar10gpu.py
exit
Container list (as of August 2024)
/sw/user/NGC_containers/cuquantum-appliance_24.08.sif /sw/user/NGC_containers/pytorch_24.08-py3-compat.sif
/sw/user/NGC_containers/gromacs_2023.2.sif /sw/user/NGC_containers/pytorch_24.08-py3.sif
/sw/user/NGC_containers/jax_23.10-py3.sif /sw/user/NGC_containers/pytorch_24.09-py3.sif
/sw/user/NGC_containers/jax_24.04-py3.sif /sw/user/NGC_containers/tensorflow_23.09-tf2-py3.sif
/sw/user/NGC_containers/lammps_patch_15Jun2023.sif /sw/user/NGC_containers/tensorflow_24.07-tf2-py3.sif
/sw/user/NGC_containers/namd_3.0-alpha3-singlenode.sif /sw/user/NGC_containers/tensorflow_24.08-tf2-py3.sif
/sw/user/NGC_containers/pytorch_24.07-py3.sif /sw/user/NGC_containers/tensorflow_24.09-tf2-py3.sif
See the NVIDIA containers catalog for more information.