TensorFlow on DeltaAI
Summary
The options to run TensorFlow are NGC containers like:
tensorflow_24.09-tf2-py3.sif
(in/sw/user/NGC_containers
).Power users will run into errors or install fails when trying to build their own environments beyond the container.
pip install --user
into$HOME
or a$PYTHONUSERBASE
(see below) to work around this.
jupyter-notebook
is in the container.Remember to add the
--nv
flag to the srun apptainer command line when using any NGC container.
Run TensorFlow
Warning
TensorFlow on DeltaAI must use the NGC container. NVIDIA has told us that it is not possible to get a GPU-enabled TensorFlow by other means (pip or conda installs) and DeltaAI admins have confirmed that locally. After installing TensorFlow on your own, runtime will throw this error:
> python3 -c “import tensorflow as tf; print(tf.config.list_physical_devices(‘GPU’))” <jemalloc>: Unsupported system page size
Customization
The container does not support python venv
(it’s not installed), and conda
is not available inside the container. Instead, use the PYTHONUSERBASE
environment variable to specify a (possibly shared) path where you will install additions to the tensorflow container’s python. If you are using a jupyter notebook you will need to “restart kernel” from the menu to make your changes visible to jupyter. See also: PYTHONUSERBASE:
Installing from within the Container
arnoldg@gh001:~> export PYTHONUSERBASE=/projects/bbka/arnoldg/tensorflow_modules
arnoldg@gh001:~> apptainer shell --bind /projects /sw/user/NGC_containers/tensorflow_24.09-tf2-py3.sif
Apptainer> pip install --user matplotlib
...
Successfully installed contourpy-1.2.1 cycler-0.12.1 fonttools-4.53.1 kiwisolver-1.4.5 matplotlib-3.9.0 pillow-10.4.0
Apptainer> python3
Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
Could not open PYTHONSTARTUP
FileNotFoundError: [Errno 2] No such file or directory: '/etc/pythonstart'
>>> import matplotlib
>>> exit()
Apptainer> echo $PYTHONUSERBASE
/projects/bbka/arnoldg/tensorflow_modules
Apptainer> ls $PYTHONUSERBASE/lib/python3.10/site-packages/
PIL fontTools mpl_toolkits
__pycache__ fonttools-4.53.1.dist-info pillow-10.4.0.dist-info
contourpy kiwisolver pillow.libs
contourpy-1.2.1.dist-info kiwisolver-1.4.5.dist-info pylab.py
cycler matplotlib
cycler-0.12.1.dist-info matplotlib-3.9.0.dist-info
Apptainer>
Package Install Location
arnoldg@gh001:~/.local/lib/python3.10/site-packages> pwd
/u/arnoldg/.local/lib/python3.10/site-packages
arnoldg@gh001:~/.local/lib/python3.10/site-packages> ls
contourpy fontTools matplotlib pillow-10.4.0.dist-info
contourpy-1.2.1.dist-info fonttools-4.53.1.dist-info matplotlib-3.9.1.dist-info pillow.libs
cycler kiwisolver mpl_toolkits __pycache__
cycler-0.12.1.dist-info kiwisolver-1.4.5.dist-info PIL pylab.py
Runtime Items of Note
Use some CPU cores with this container (--cpus-per-task=64
). It takes quite a few ARM cores to keep the H100 GPUs working at peak.