Containers

Apptainer (formerly Singularity)

Container support on DeltaAI is provided by Apptainer.

Docker images can be converted to Apptainer sif format via the apptainer pull command. Commands can be run from within a container using the apptainer run command.

If you encounter $HOME quota issues with Apptainer caching in ~/.apptainer, the environment variable APPTAINER_CACHEDIR can be used to select a different location such as a directory under /scratch. See also, apptainer build environment

Your $HOME is automatically available from containers run via Apptainer. You can pip3 install --user against a container’s python (use PYTHONUSERBASE with this option), setup virtual environments, or similar while using a containerized application. Just run the container’s /bin/bash (or use apptainer shell <container> for a quick look from a login node). Below is an srun example of that with TensorFlow:

arnoldg@gh002:~> cat ~/bin/salloc.sh
#!/bin/bash
salloc \
 --mem=64g \
 --nodes=1 \
 --ntasks-per-node=1 \
 --cpus-per-task=8 \
 --partition=ghx4 \
 --time=00:50:00 \
 --job-name=generic \
 --gpus-per-node=1

arnoldg@gh002:~> ~/bin/salloc.sh
salloc: Granted job allocation 50737
salloc: Nodes gh076 are ready for job
arnoldg@gh002:~> srun apptainer run --nv --bind /projects /sw/user/NGC_containers/tensorflow_24.07-tf2-py3-igpu.sif /bin/bash --login

NVIDIA Release 24.07-tf2 (build 100465222)
TensorFlow Version 2.16.1
Container image Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Copyright 2017-2024 The TensorFlow Authors.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

python --version
Python 3.10.12
exit
arnoldg@gh002:~>

NVIDIA NGC Containers

DeltaAI provides NVIDIA NGC Docker containers that are pre-built with Apptainer. Look for the latest binary containers in /sw/external/NGC/. The containers are used as shown in the sample scripts below:

PyTorch Example Script

#!/bin/bash
#SBATCH --mem=64g
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=4
#SBATCH --partition=test
#SBATCH --time=00:20:00
#SBATCH --job-name=pytorchNGC
### GPU options ###
#SBATCH --gpus-per-node=1

module list  # job documentation and metadata

echo "job is starting on `hostname`"

# run the container binary with arguments: python3 <program.py>
time srun \
  apptainer run --nv \
    --bind /projects \
    /sw/user/NGC_containers/pytorch_24.07-py3.sif python3 tensor_gpu.py

exit

TensorFlow Example Script

#!/bin/bash
#SBATCH --mem=80g
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=8
#SBATCH --partition=ghx4
#SBATCH --time=00:20:00
#SBATCH --job-name=tfngc
### GPU options ###
#SBATCH --gpus-per-node=1

module list  # job documentation and metadata

echo "job is starting on `hostname`"

time apptainer run --nv \
 --bind /projects \
   /sw/user/NGC_containers/tensorflow_24.07-tf2-py3-igpu.sif python3 \
   cifar10gpu.py

exit

Container list (as of August 2024)

/sw/user/NGC_containers/gromacs_2023.2.sif
/sw/user/NGC_containers/lammps_patch_15Jun2023.sif
/sw/user/NGC_containers/namd_3.0-alpha3-singlenode.sif
/sw/user/NGC_containers/pytorch_24.07-py3.sif
/sw/user/NGC_containers/tensorflow_24.07-tf2-py3-igpu.sif

See the NVIDIA containers catalog for more information.