TensorFlow on DeltaAI

Summary

  • The options to run TensorFlow are NGC containers such as: tensorflow_24.09-tf2-py3.sif (in /sw/user/NGC_containers) and our module: python/miniforge3_tensorflow_cuda.

  • Power users can install tensorflow in their own venv or conda environments via: pip install --extra-index-url=https://developer.download.nvidia.com/compute/redist nvidia_tensorflow==2.17.0+nv24.11

  • jupyter-notebook is in the container and our module.

  • Remember to add the --nv flag to the srun apptainer command line when using any NGC container.

Customization

The container does not support python venv (it’s not installed), and conda is not available inside the container. Instead, use the PYTHONUSERBASE environment variable to specify a (possibly shared) path where you will install additions to the tensorflow container’s python. If you are using a jupyter notebook you will need to “restart kernel” from the menu to make your changes visible to jupyter. See also: PYTHONUSERBASE:

Installing from within the Container

arnoldg@gh001:~> export PYTHONUSERBASE=/projects/bbka/arnoldg/tensorflow_modules
arnoldg@gh001:~> apptainer shell --bind /projects /sw/user/NGC_containers/tensorflow_24.09-tf2-py3.sif
Apptainer> pip install --user matplotlib
...
Successfully installed contourpy-1.2.1 cycler-0.12.1 fonttools-4.53.1 kiwisolver-1.4.5 matplotlib-3.9.0 pillow-10.4.0
Apptainer> python3
Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
Could not open PYTHONSTARTUP
FileNotFoundError: [Errno 2] No such file or directory: '/etc/pythonstart'
>>> import matplotlib
>>> exit()
Apptainer> echo $PYTHONUSERBASE
/projects/bbka/arnoldg/tensorflow_modules
Apptainer> ls $PYTHONUSERBASE/lib/python3.10/site-packages/
PIL                        fontTools                   mpl_toolkits
__pycache__                fonttools-4.53.1.dist-info  pillow-10.4.0.dist-info
contourpy                  kiwisolver                  pillow.libs
contourpy-1.2.1.dist-info  kiwisolver-1.4.5.dist-info  pylab.py
cycler                     matplotlib
cycler-0.12.1.dist-info    matplotlib-3.9.0.dist-info
Apptainer>

Package Install Location

arnoldg@gh001:~/.local/lib/python3.10/site-packages> pwd
/u/arnoldg/.local/lib/python3.10/site-packages
arnoldg@gh001:~/.local/lib/python3.10/site-packages> ls
contourpy                  fontTools                   matplotlib                  pillow-10.4.0.dist-info
contourpy-1.2.1.dist-info  fonttools-4.53.1.dist-info  matplotlib-3.9.1.dist-info  pillow.libs
cycler                     kiwisolver                  mpl_toolkits                __pycache__
cycler-0.12.1.dist-info    kiwisolver-1.4.5.dist-info  PIL                         pylab.py

Runtime Items of Note

Use some CPU cores with this container or module (--cpus-per-task=64). It takes quite a few ARM cores to keep the H100 GPUs working at peak.