Installed Software
DeltaAI software is provisioned using the HPE Cray Programming Environment (CPE). Select NVIDIA NGC containers are made available (see Containers) and are periodically updated from the NVIDIA NGC site. An automated list of available software can be found on the ACCESS website.
Modules/Lmod
DeltaAI provides HPE/Cray modules and compilers. The functional programming environments are PrgEnv-gnu
and PrgEnv-cray
. The default environment loads PrgEnv-gnu
.
Use module spider package_name
to search for software in Lmod and see the steps to load it in your environment.
See also: User Guide for Lmod.
Please submit a support request for help with software not currently installed on DeltaAI. For general installation requests, the DeltaAI project office will review requests for broad use and installation effort.
Python
Note
When submitting support requests for python, please provide the following and understand that DeltaAI support staff time is a finite resource while python developments (new software and modules) are growing at nearly infinite velocity:
Python version or environment used (describe fully, with the commands needed to reproduce)
Error output or log from what went wrong (screenshots are more difficult to work with than text data)
Pertinent URLs describing what you were following/attempting (if applicable), note that URL recipes specific to vendors may be difficult to reproduce when not using their cloud resources (Google Colab, for example)
DeltaAI’s architecture is aarch64 and many python packages may not be built for that, if you cannot find a python wheel then building from source may be the only option. There is no guarantee your desired software can be ported to the new architecture with minimal effort.
TensorFlow is only supported from Nvidia’s NGC container. Python sw stacks that require TensorFlow may be difficult (or impossible) to adapt to DeltaAI. See the notes about it at TensorFlow on DeltaAI.
On DeltaAI, you may install your own python software stacks, as needed.
There are choices when customizing your python setup. If you anticipate maintaining multiple python environments or installing many packages, you may want to target a filesystem with more quota space (not $HOME
) for your environments. /scratch
or /projects
may be more appropriate in that case.
You may use any of these methods with any of the python versions or instances described below (or you may install your own python versions):
venv (python virtual environment)
Can name environments (metadata) and have multiple environments per python version or instance. pip installs are local to the environment. You specify the path when using venv:
python -m venv /path/to/env
.conda (or miniforge) environments
Similar to venv but with more flexibility, see this comparison table. See also the miniforge environment option: miniforge. pip and conda installs are local to the environment and the location defaults to
$HOME/.conda
. You can override the default location in$HOME
by using the--prefix
syntax:conda create --prefix /path/to/env
. You can also relocate your .conda directory to your project space, which has a larger quota than your home directory.pip3:
pip3 install --user <python_package>
CAUTION: Python modules installed this way into your
$HOME/.local/
will match on python versions. This can create incompatibilities between containers or python venv or conda environments when they have a common python version number. You can work around this by using the PYTHONUSERBASE environment variable. That will also allow for shared pip installs if you choose a group-shared directory.conda-env-mod Lmod module generator from Purdue
The conda-env-mod script will generate a python module you can load or share with your team. This makes it simpler to manage multiple python scenarios that you can activate and deactivate with module commands.
pyenv python version management
Pyenv helps you manage multiple python versions. You can also use more than one python version at once in a project using pyenv.
Note
The NVIDIA NGC Containers on Delta provide optimized python frameworks built for DeltaAI’s H100 GPUs. Delta staff recommend using an NGC container when possible with the GPU nodes (or use the anaconda3_gpu module).
Python (a recent or latest version)
If you don’t need all the extra modules provided by Anaconda, use the basic python installation provided by Cray or install your own for aarch64.
You can add modules via pip3 install --user <modulename>
, setup virtual environments, and customize, as needed, for your workflow starting from a smaller installed base of python than Anaconda.
$ module load cray-python
$ which python
/opt/cray/pe/python/3.11.7/bin/python
cray-python
includes: numpy, mpi4py, and pandas .
miniforge3
python/miniforge3_pytorch
Use python from the python/miniforge3_pytorch
module if you need some of the modules provided by conda-forge in your python workflow.
See the Managing Environments section of the conda getting started guide to learn how to customize conda for your workflow and add extra python modules to your environment.
Note
If you use conda with NGC containers, take care to use python from the container and not python from conda or one of its environments.
The container’s python should be first in $PATH
.
You may --bind
the conda directory or other paths into the container so that you can start your conda environments with the container’s python (/usr/bin/python
).
The Anaconda archive contains previous Anaconda versions. The bundles are not small, but using one from Anaconda will ensure that you get software that was built to work together. If you require an older version of a python lib/module, NCSA staff suggest looking back in time at the Anaconda site (though this will be a limited timeline due to the new grace-hopper aarch64 in DeltaAI).
Python Environments with conda
See the Conda configuration documentation if you want to disable automatic conda environment activation.
Note
When using your own custom conda environment with a batch job, submit the batch job from within the environment and do not add conda activate
commands to the job script; the job inherits your environment.
Batch Jobs
Batch jobs will honor the commands you execute within them. Purge/unload/load modules as needed for that job.
A clean slate might resemble (user has a conda init clause in bashrc for a custom environment):
conda deactivate
conda deactivate # just making sure
module reset # load the default DeltaAI modules
conda activate base
# commands to load modules and activate environs such that your environment is active before
# you use slurm ( do not include conda activate commands in the slurm script )
sbatch myjob.slurm # or srun or salloc
Non-python/conda HPC users would see per-job stderr from the conda deactivate
above (user has never run conda init bash
):
[arnoldg@gh-login03 ~]$ conda deactivate
bash: conda: command not found
[arnoldg@gh-login03 ~]$
# or
[arnoldg@gh-login03 ~]$ conda deactivate
CommandNotFoundError: Your shell has not been properly configured to use 'conda deactivate'.
To initialize your shell, run
$ conda init <SHELL_NAME>
Currently supported shells are:
- bash
- tcsh
- zsh
See 'conda init --help' for more information and options.
IMPORTANT: You may need to close and restart your shell after running 'conda init'.
PyTorch
Information on how to set up and run PyTorch.
TensorFlow
Information on how to set up and run TensorFlow.
Containers
See Containers.
Jupyter Notebooks
Warning
This section is under construction.
Note
The DeltaAI Open OnDemand (OOD) dashboard provides an easy method to start a Jupyter notebook; this is the recommended method.
Go to OOD Jupyter interactive app for instructions on how to start an OOD JupyterLab session.
You can also customize your OOD JupyterLab environment:
Do not run Jupyter on the shared login nodes. Instead, follow these steps to attach a Jupyter notebook running on a compute node to your local web browser:
The Jupyter notebook executables are in your $PATH
after loading the anaconda3
module. If you run into problems from a previously saved Jupyter session (for example, you see paths where you do not have write permission), you may remove this file to get a fresh start: $HOME/.jupyter/lab/workspaces/default-*
.
Follow these steps to run Jupyter on a compute node (CPU or GPU):
On your local machine/laptop, open a terminal.
SSH into DeltaAI. (Replace
<my_delta_username>
with your DeltaAI login username).ssh <my_deltaai_username>@gh-login.delta.ncsa.illinois.edu
Enter your NCSA password and complete the Duo MFA. Note, the terminal will not show your password (or placeholder symbols such as asterisks [*]) as you type.
Warning
If there is a conda environment active when you log into DeltaAI, deactivate it before you continue. You will know you have an active conda environment if your terminal prompt has an environment name in parentheses prepended to it, like these examples:
(base) [<gh-login_username>@gh-login01 ~]$ (mynewenv) [<gh-login_username>@gh-login01 ~]$
Run
conda deactivate
until there is no longer a name in parentheses prepended to your terminal prompt. When you don’t have any conda environment active, your prompt will look like this:[<gh-login_username>@dt-login01 ~]$
Load the appropriate anaconda module. To see all of the available anaconda modules, run
module avail anaconda
. This example usespython/miniforge3_pytorch
.module load python/miniforge3_pytorch
Verify the module is loaded.
module list
Verify a jupyter-notebook is in your
$PATH
.which jupyter-notebook
Generate a
MYPORT
number and copy it to a notepad (you will use it in steps 9 and 12).MYPORT=$(($(($RANDOM % 10000))+49152)); echo $MYPORT
Find the the
account_name
that you are going to use and copy it to a notepad (you will use it in step 9); your accounts are listed underProject
when you run theaccounts
command.Note
To use a GPU node, you must pick a GPU account (the account name will end in “…-gpu”).
accounts
Run the following
srun
command, with these replacements:Replace
<account_name>
with the account you are going to use, which you found and copied in step 8.Replace
<$MYPORT>
with the$MYPORT
number you generated in step 7.Modify the
--partition
,--time
, and--mem
options and/or add other options to meet your needs.
srun --account=<account_name> --partition=ghx4 --time=00:30:00 --mem=32g jupyter-notebook --no-browser --port=<$MYPORT> --ip=0.0.0.0
Copy the last 5 lines returned beginning with: “To access the notebook, open this file in a browser…” to a notepad (you will use this information steps 12 and 14). (It may take a few minutes for these lines to be returned.)
Note these two things about the URLs you copied:
The first URL begins with
http://<ghXXX>.delta...
,<ghXXX>
is the internal hostname and will be used in step 12.The second URL begins with
http://127.0...
, you will use this entire URL in step 14.
Open a second terminal on your local machine/laptop.
Run the following
ssh
command, with these replacements:Replace
<my_deltaai_username>
with your DeltaAI login username.Replace
<$MYPORT>
with the$MYPORT
number you generated in step 7.Replace
<ghXXX>
with internal hostname you copied in step 10.
ssh -l <my_delta_username> -L 127.0.0.1:<$MYPORT>:<ghXXX>.delta.ncsa.illinois.edu:<$MYPORT> gh-login.delta.ncsa.illinois.edu
Enter your NCSA password and complete the Duo MFA. Note, the terminal will not show your password (or placeholder symbols such as asterisks [*]) as you type.
Copy and paste the entire second URL from step 10 (begins with
https://127.0...
) into your browser. You will be connected to the Jupyter instance running on your compute node of Delta.
Follow these steps to run Jupyter on a compute node, in an NGC container:
On your local machine/laptop, open a terminal.
SSH into DeltaAI. (Replace
<my_deltaai_username>
with your DeltaAI login username.)ssh <my_delta_username>@gh-login.delta.ncsa.illinois.edu
Enter your NCSA password and complete the Duo MFA. Note, the terminal will not show your password (or placeholder symbols such as asterisks [*]) as you type.
Generate a
$MYPORT
number and copy it to a notepad (you will use it in steps 6, 8, and 14).MYPORT=$(($(($RANDOM % 10000))+49152)); echo $MYPORT
Find the the
account_name
that you are going to use and copy it to a notepad (you will use it in step 6); your accounts are listed underProject
when you run theaccounts
command.Note
To use a GPU node, you must pick a GPU account (the account name will end in “…-gpu”).
accounts
Run the following
srun
command, with these replacements:Replace
<account_name>
with the account you are going to use, which you found and copied in step #5.Replace
<project_path>
with the name of your projects folder (in two places).Replace
<$MYPORT>
with theMYPORT
number you generated in step 4.Modify the
--partition
,--time
,--mem
, and--gpus-per-node
options and/or add other options to meet your needs.
srun --account=<account_name> --partition=ghx4-interactive --time=00:30:00 --mem=64g --gpus-per-node=1 apptainer run --nv --bind /projects/<project_path> /sw/user/NGC_containers/pytorch_24.07-py3.sif jupyter-notebook --notebook-dir /projects/<project_path> --no-browser --port=<$MYPORT> --ip=0.0.0.0
Copy the last 2 lines returned (beginning with “Or copy and paste this URL…”) to a notepad. (It may take a few minutes for these lines to be returned.)
Modify the URL you copied in step 7 by changing
hostname:8888
to127.0.0.1:<$MYPORT>
. You will use the modified URL in step 16. (Replace<$MYPORT>
with the$MYPORT
number you generated in step 4.)Open a second terminal.
SSH into DeltaAI. (Replace
<my_deltaai_username>
with your DeltaAI login username.)ssh <my_deltaai_username>@gh-login.delta.ncsa.illinois.edu
Enter your NCSA password and complete the Duo MFA. Note, the terminal will not show your password (or placeholder symbols such as asterisks [*]) as you type.
Find the internal hostname for your job and copy it to a notepad (you will use it in step 14).
squeue -u $USER
The value returned under
NODELIST
is the internal hostname for your GPU job (ghXXX
). You can now close this terminal.Open a third terminal.
Run the following
ssh
command, with these replacements:Replace
<my_deltaai_username>
with your DeltaAI login username.Replace
<$MYPORT>
with the$MYPORT
number you generated in step 4.Replace
<ghXXX>
with internal hostname you copied in step 12.
ssh -l <my_deltaai_username> -L 127.0.0.1:<$MYPORT>:<ghXXX>.delta.internal.ncsa.edu:<$MYPORT> gh-login.delta.ncsa.illinois.edu
Enter your NCSA password and complete the Duo MFA. Note, the terminal will not show your password (or placeholder symbols such as asterisks [*]) as you type.
Copy and paste the entire modified URL (beginning with
https://127.0...
) from step 8 into your browser. You will be connected to the Jupyter instance running on your gpu node of DeltaAI.
List of Installed Software (CPU & GPU)
See: module avail
.