Installed Software

DeltaAI software is provisioned using the HPE Cray Programming Environment (CPE). Select NVIDIA NGC containers are made available (see Containers) and are periodically updated from the NVIDIA NGC site. An automated list of available software can be found on the ACCESS website.

Modules/Lmod

DeltaAI provides HPE/Cray modules and compilers. The functional programming environments are PrgEnv-gnu and PrgEnv-cray. The default environment loads PrgEnv-gnu.

Use module spider package_name to search for software in Lmod and see the steps to load it in your environment.

See also: User Guide for Lmod.

Please submit a support request for help with software not currently installed on DeltaAI. For general installation requests, the DeltaAI project office will review requests for broad use and installation effort.

Python

Note

When submitting support requests for python, please provide the following and understand that DeltaAI support staff time is a finite resource while python developments (new software and modules) are growing at nearly infinite velocity:

  • Python version or environment used (describe fully, with the commands needed to reproduce)

  • Error output or log from what went wrong (screenshots are more difficult to work with than text data)

  • Pertinent URLs describing what you were following/attempting (if applicable), note that URL recipes specific to vendors may be difficult to reproduce when not using their cloud resources (Google Colab, for example)

  • DeltaAI’s architecture is aarch64 and many python packages may not be built for that, if you cannot find a python wheel then building from source may be the only option. There is no guarantee your desired software can be ported to the new architecture with minimal effort.

  • TensorFlow is only supported from Nvidia’s NGC container. Python sw stacks that require TensorFlow may be difficult (or impossible) to adapt to DeltaAI. See the notes about it at TensorFlow on DeltaAI.

On DeltaAI, you may install your own python software stacks, as needed. There are choices when customizing your python setup. If you anticipate maintaining multiple python environments or installing many packages, you may want to target a filesystem with more quota space (not $HOME) for your environments. /scratch or /projects may be more appropriate in that case. You may use any of these methods with any of the python versions or instances described below (or you may install your own python versions):

  • venv (python virtual environment)

    Can name environments (metadata) and have multiple environments per python version or instance. pip installs are local to the environment. You specify the path when using venv: python -m venv /path/to/env.

  • conda (or miniforge) environments

    Similar to venv but with more flexibility, see this comparison table. See also the miniforge environment option: miniforge. pip and conda installs are local to the environment and the location defaults to $HOME/.conda. You can override the default location in $HOME by using the --prefix syntax: conda create --prefix /path/to/env. You can also relocate your .conda directory to your project space, which has a larger quota than your home directory.

  • pip3: pip3 install --user <python_package>

    CAUTION: Python modules installed this way into your $HOME/.local/ will match on python versions. This can create incompatibilities between containers or python venv or conda environments when they have a common python version number. You can work around this by using the PYTHONUSERBASE environment variable. That will also allow for shared pip installs if you choose a group-shared directory.

  • conda-env-mod Lmod module generator from Purdue

    The conda-env-mod script will generate a python module you can load or share with your team. This makes it simpler to manage multiple python scenarios that you can activate and deactivate with module commands.

  • pyenv python version management

    Pyenv helps you manage multiple python versions. You can also use more than one python version at once in a project using pyenv.

Note

The NVIDIA NGC Containers on Delta provide optimized python frameworks built for DeltaAI’s H100 GPUs. Delta staff recommend using an NGC container when possible with the GPU nodes (or use the anaconda3_gpu module).

Python (a recent or latest version)

If you don’t need all the extra modules provided by Anaconda, use the basic python installation provided by Cray or install your own for aarch64. You can add modules via pip3 install --user <modulename>, setup virtual environments, and customize, as needed, for your workflow starting from a smaller installed base of python than Anaconda.

$ module load cray-python
$ which python
/opt/cray/pe/python/3.11.7/bin/python

cray-python includes: numpy, mpi4py, and pandas .

miniforge3

python/miniforge3_pytorch

Use python from the python/miniforge3_pytorch module if you need some of the modules provided by conda-forge in your python workflow. See the Managing Environments section of the conda getting started guide to learn how to customize conda for your workflow and add extra python modules to your environment.

Note

If you use conda with NGC containers, take care to use python from the container and not python from conda or one of its environments. The container’s python should be first in $PATH. You may --bind the conda directory or other paths into the container so that you can start your conda environments with the container’s python (/usr/bin/python).

The Anaconda archive contains previous Anaconda versions. The bundles are not small, but using one from Anaconda will ensure that you get software that was built to work together. If you require an older version of a python lib/module, NCSA staff suggest looking back in time at the Anaconda site (though this will be a limited timeline due to the new grace-hopper aarch64 in DeltaAI).

Python Environments with conda

See the Conda configuration documentation if you want to disable automatic conda environment activation.

Note

When using your own custom conda environment with a batch job, submit the batch job from within the environment and do not add conda activate commands to the job script; the job inherits your environment.

Batch Jobs

Batch jobs will honor the commands you execute within them. Purge/unload/load modules as needed for that job.

A clean slate might resemble (user has a conda init clause in bashrc for a custom environment):

conda deactivate
conda deactivate  # just making sure
module reset      # load the default DeltaAI modules

conda activate base
# commands to load modules and activate environs such that your environment is active before
# you use slurm ( do not include conda activate commands in the slurm script )

sbatch myjob.slurm  # or srun or salloc

Non-python/conda HPC users would see per-job stderr from the conda deactivate above (user has never run conda init bash):

[arnoldg@gh-login03 ~]$ conda deactivate
bash: conda: command not found
[arnoldg@gh-login03 ~]$

# or

[arnoldg@gh-login03 ~]$ conda deactivate

CommandNotFoundError: Your shell has not been properly configured to use 'conda deactivate'.
To initialize your shell, run

    $ conda init <SHELL_NAME>

Currently supported shells are:
  - bash
  - tcsh
  - zsh

See 'conda init --help' for more information and options.

IMPORTANT: You may need to close and restart your shell after running 'conda init'.

PyTorch

Information on how to set up and run PyTorch.

TensorFlow

Information on how to set up and run TensorFlow.

Containers

See Containers.

Jupyter Notebooks

Warning

This section is under construction.

Note

The DeltaAI Open OnDemand (OOD) dashboard provides an easy method to start a Jupyter notebook; this is the recommended method.

Go to OOD Jupyter interactive app for instructions on how to start an OOD JupyterLab session.

You can also customize your OOD JupyterLab environment:

Do not run Jupyter on the shared login nodes. Instead, follow these steps to attach a Jupyter notebook running on a compute node to your local web browser:

How to Run Jupyter on a Compute Node

The Jupyter notebook executables are in your $PATH after loading the anaconda3 module. If you run into problems from a previously saved Jupyter session (for example, you see paths where you do not have write permission), you may remove this file to get a fresh start: $HOME/.jupyter/lab/workspaces/default-*.

Follow these steps to run Jupyter on a compute node (CPU or GPU):

  1. On your local machine/laptop, open a terminal.

  2. SSH into DeltaAI. (Replace <my_delta_username> with your DeltaAI login username).

    ssh <my_deltaai_username>@gh-login.delta.ncsa.illinois.edu
    
  3. Enter your NCSA password and complete the Duo MFA. Note, the terminal will not show your password (or placeholder symbols such as asterisks [*]) as you type.

    Warning

    If there is a conda environment active when you log into DeltaAI, deactivate it before you continue. You will know you have an active conda environment if your terminal prompt has an environment name in parentheses prepended to it, like these examples:

    (base) [<gh-login_username>@gh-login01 ~]$
    
    (mynewenv) [<gh-login_username>@gh-login01 ~]$
    

    Run conda deactivate until there is no longer a name in parentheses prepended to your terminal prompt. When you don’t have any conda environment active, your prompt will look like this:

    [<gh-login_username>@dt-login01 ~]$
    
  4. Load the appropriate anaconda module. To see all of the available anaconda modules, run module avail anaconda. This example uses python/miniforge3_pytorch.

    module load python/miniforge3_pytorch
    
  5. Verify the module is loaded.

    module list
    
  6. Verify a jupyter-notebook is in your $PATH.

    which jupyter-notebook
    
  7. Generate a MYPORT number and copy it to a notepad (you will use it in steps 9 and 12).

    MYPORT=$(($(($RANDOM % 10000))+49152)); echo $MYPORT
    
  8. Find the the account_name that you are going to use and copy it to a notepad (you will use it in step 9); your accounts are listed under Project when you run the accounts command.

    accounts
    
  9. Run the following srun command, with these replacements:

    • Replace <account_name> with the account you are going to use, which you found and copied in step 8.

    • Replace <$MYPORT> with the $MYPORT number you generated in step 7.

    • Modify the --partition, --gpus, --time, and --mem options and/or add other options to meet your needs.

    srun --account=<account_name> --partition=ghx4 --gpus=1 --time=00:30:00 --mem=32g jupyter-notebook --no-browser --port=<$MYPORT> --ip=0.0.0.0
    
  10. Copy the last 5 lines returned beginning with: “To access the notebook, open this file in a browser…” to a notepad (you will use this information steps 12 and 14). (It may take a few minutes for these lines to be returned.)

    Note these two things about the URLs you copied:

    • The first URL begins with http://<ghXXX>.delta..., <ghXXX> is the internal hostname and will be used in step 12.

    • The second URL begins with http://127.0..., you will use this entire URL in step 14.

  11. Open a second terminal on your local machine/laptop.

  12. Run the following ssh command, with these replacements:

    • Replace <my_deltaai_username> with your DeltaAI login username.

    • Replace <$MYPORT> with the $MYPORT number you generated in step 7.

    • Replace <ghXXX> with internal hostname you copied in step 10.

    ssh -l <my_delta_username> -L 127.0.0.1:<$MYPORT>:<ghXXX>.delta.ncsa.illinois.edu:<$MYPORT> gh-login.delta.ncsa.illinois.edu
    
  13. Enter your NCSA password and complete the Duo MFA. Note, the terminal will not show your password (or placeholder symbols such as asterisks [*]) as you type.

  14. Copy and paste the entire second URL from step 10 (begins with https://127.0...) into your browser. You will be connected to the Jupyter instance running on your compute node of Delta.

    Jupyter screenshot

How to Run Jupyter on a Compute Node, in an NGC Container

Follow these steps to run Jupyter on a compute node, in an NGC container:

  1. On your local machine/laptop, open a terminal.

  2. SSH into DeltaAI. (Replace <my_deltaai_username> with your DeltaAI login username.)

    ssh <my_delta_username>@gh-login.delta.ncsa.illinois.edu
    
  3. Enter your NCSA password and complete the Duo MFA. Note, the terminal will not show your password (or placeholder symbols such as asterisks [*]) as you type.

  4. Generate a $MYPORT number and copy it to a notepad (you will use it in steps 6, 8, and 14).

    MYPORT=$(($(($RANDOM % 10000))+49152)); echo $MYPORT
    
  5. Find the the account_name that you are going to use and copy it to a notepad (you will use it in step 6); your accounts are listed under Project when you run the accounts command.

    accounts
    
  6. Run the following srun command, with these replacements:

    • Replace <account_name> with the account you are going to use, which you found and copied in step #5.

    • Replace <project_path> with the name of your projects folder (in two places).

    • Replace <$MYPORT> with the MYPORT number you generated in step 4.

    • Modify the --partition, --gpus, --time, --mem, and --gpus-per-node options and/or add other options to meet your needs.

    srun --account=<account_name> --partition=ghx4-interactive --gpus=1 --time=00:30:00 --mem=64g --gpus-per-node=1 apptainer run --nv --bind /projects/<project_path> /sw/user/NGC_containers/pytorch_24.07-py3.sif jupyter-notebook --notebook-dir /projects/<project_path> --no-browser --port=<$MYPORT> --ip=0.0.0.0
    
  7. Copy the last 2 lines returned (beginning with “Or copy and paste this URL…”) to a notepad. (It may take a few minutes for these lines to be returned.)

  8. Modify the URL you copied in step 7 by changing hostname:8888 to 127.0.0.1:<$MYPORT>. You will use the modified URL in step 16. (Replace <$MYPORT> with the $MYPORT number you generated in step 4.)

  9. Open a second terminal.

  10. SSH into DeltaAI. (Replace <my_deltaai_username> with your DeltaAI login username.)

    ssh <my_deltaai_username>@gh-login.delta.ncsa.illinois.edu
    
  11. Enter your NCSA password and complete the Duo MFA. Note, the terminal will not show your password (or placeholder symbols such as asterisks [*]) as you type.

  12. Find the internal hostname for your job and copy it to a notepad (you will use it in step 14).

    squeue -u $USER
    

    The value returned under NODELIST is the internal hostname for your GPU job (ghXXX). You can now close this terminal.

  13. Open a third terminal.

  14. Run the following ssh command, with these replacements:

    • Replace <my_deltaai_username> with your DeltaAI login username.

    • Replace <$MYPORT> with the $MYPORT number you generated in step 4.

    • Replace <ghXXX> with internal hostname you copied in step 12.

    ssh -l <my_deltaai_username> -L 127.0.0.1:<$MYPORT>:<ghXXX>.delta.internal.ncsa.edu:<$MYPORT> gh-login.delta.ncsa.illinois.edu
    
  15. Enter your NCSA password and complete the Duo MFA. Note, the terminal will not show your password (or placeholder symbols such as asterisks [*]) as you type.

  16. Copy and paste the entire modified URL (beginning with https://127.0...) from step 8 into your browser. You will be connected to the Jupyter instance running on your gpu node of DeltaAI.

    Jupyter screenshot

List of Installed Software (CPU & GPU)

See: module avail.