Installed Software

Software on TGI RAILS is provided through the use of the Lmod module system. RAILS provides many common software packages through modules such as CUDA, MPI, and Python. Access to system installed compilers such as GCC, Intel and Nvidia are also provided this way. Modules make it easy for users to get setup and started quickly on development, data analysis, and scientific computing without the need to configure their environment manually.

If there is additional software that is not already provided on the system that you need or you need help installing software yourself, please open a service request ticket by sending email to help+tgi@ncsa.illinois.edu.

For single user or single project use cases the preference is for the user to install the software in their own userspace. For general installation requests TGI RAILS staff will review requests on a case by case basis.

Modules (Lmod)

The user environment is controlled using the modules environment management system. Modules may be loaded, unloaded, or swapped either on a command line or in your $HOME/.bashrc (.cshrc for csh ) shell startup file.

The module command is a user interface to the Lmod package. The Lmod package provides for the dynamic modification of the user’s environment via modulefiles (a modulefile contains the information needed to configure the shell for an application). Modules are independent of the user’s shell, so both tcsh and bash users can use the same commands to change the environment.

Useful Module Commands

Command

Description

module avail

lists all available modules

module list

lists currently loaded modules

module avail | more

display the available modules on the system one page at a time

module spider foo

search for modules named foo

module help modulefile

help on module modulefile

module display modulefile

display information about modulefile

module load modulefile

load modulefile into current shell environment

module unload modulefile

remove modulefile from current shell environment

module swap modulefile1 modulefile2

unload modulefile1 and load modulefile2

To include a particular software stack in your default environment for TGI RAILS login and compute nodes:

  1. Log into a TGI RAILS login node.

  2. Manipulate your modulefile stack until satisfied.

#. Run module save; this will create a .lmod.d/default file that will be loaded on the TGI RAILS login or compute nodes on your next login or job execution.

Useful User Defined Module Collections

Command

Description

module save

save current modulefile stack to ~/.lmod.d/default

module save collection_name

save current modulefile stack to ~/.lmod.d/collection_name

module restore

load ~/.lmod.d/default if it exists or System default

module restore collection_name

load your ~/.lmod.d/collection_name

module reset

reset your modulefiles to System default

module disable collection_name

disable collection_name by adding collection_name~

module savelist

list all your ~/.lmod.d/collections

module describe collection_name

list collection_name modulefiles

You can also see: User Guide for Lmod for more help on the module system.

List of currently available modules

------------------ /sw/spack/v1/modules/lmod/openmpi/4.1.6-4ukqgsw/gcc/11.4.0 -------------------
   fftw/3.3.10    hdf5/1.14.3    osu-micro-benchmarks/7.3    parallel-netcdf/1.12.3

----------------------------- /sw/spack/v1/modules/lmod/gcc/11.4.0 ------------------------------
   cuda/11.8.0    openmpi/4.1.6  python/3.11.6

---------------------------------------- /sw/modulefiles ----------------------------------------
   StdEnv                         matlab/2024a              scripts/script_paths
   conda/pytorch/2.6.0            pkgs/anaconda3/23.11.0    user/license_file
   conda/tensorflow/2.19.0        python/pytorch/2.2.0      user/user_paths

----------------------------- /usr/share/lmod/lmod/modulefiles/Core -----------------------------
   lmod    settarg

-------------------------------- /sw/spack/v1/modules/lmod/Core ---------------------------------
   cmake/3.27.7          intel-oneapi-compilers/2023.0.0        nvhpc/23.9
   gcc/11.4.0            intel-oneapi-compilers/2023.2.1
   gcc/12.3.0            intel-oneapi-mkl/2023.2.0

Compilers

TGI RAILS provides the GCC 11.4 compiler by default through the gcc/11.4.0 module. Also provided by the system are the Intel oneAPI compilers and the NVIDIA HPC compilers through their respective modules.

For more on compiling your own software on RAILS, see the Programming Environment (Building Software) page.

Python

Python on RAILS is provided through a combination of modules and preconfigured conda environments, allowing users flexibility for a variety of workflows.

The primary Python module available is python/3.11.6, which serves as a base environment and can be used to create or bootstrap your own python environments. System Python installations on RAILS leverage the conda-forge channel as the package provider, enabling users to easily install and manage additional packages within their personal environments.

RAILS also provides a set of preconfigured Python environments via Conda modules. These include environments optimized for machine learning and scientific computing: - conda/pytorch/2.6.0 - conda/tensorflow/2.19.0

You can use these modules to get started on rails without installing your own python environment. The Rails team works to frequently update these modules with the latest packages.

Python and Containers

If you use python with containers, take care to use the python from the container and not the python from anaconda or one of its environments. The container’s python should be 1st in $PATH. You may –bind the anaconda directory or other paths into the container so that you can start your conda environments, but with the container’s python (/usr/bin/python).

older versions of python and modules

Anaconda Archive contains previous Anaconda versions. The bundles are not small, but using one from Anaconda would ensure that you get software that was built to work together at a point in time. If you require an older version of a python lib/module, we suggest looking back in time at the Anaconda site.

Installing your own Python environment

On RAILS, you may install your own python software stacks as needed. There are a few choices when customizing your python setup. You may use any of these methods with any of the python versions or instances described below (or you may install your own python versions):

  1. venv (python virtual environment)

    #. venv allows you to create isolated Python environments, each with its own set of installed packages. Use it to keep project dependencies separate and avoid conflicts between different projects.

  2. conda environments

    #. Conda allows you to create isolated environments with different Python versions and packages, much like venv but provides more flexibility in supporting both Python and non-Python dependencies. It’s especially useful for managing complex project requirements, making it easy to reproduce software stacks and avoid dependency conflicts. See comparison for more details.

  3. uv (uv is a fast, modern, and user-friendly Python package manager) `uv

    #. uv is a fast, modern Python package manager designed for high performance and ease of use. Compared to conda or pip, uv installs packages much more quickly, uses less disk space, and is compatible with existing Python workflows. It provides features like workspace management and deterministic builds, making it a good choice for users who want speed and reliability without the overhead of conda environments.

  4. pip3 : pip3 install –user <python_package>

    1. useful when you need just 1 python environment per python version or instance

A couple examples using all of the above are shown at this site covering scikit-learn-intelex (an Intel accelerated scikit learn subset library for x86_64 architecture) : Scikit-learn GitHub repository

Containers

Container support on RAILS is provided by Apptainer.

Docker images can be converted to Apptainer sif format via the apptainer pull command. Commands can be run from within a container using the apptainer run command.

If you encounter $HOME quota issues with Apptainer caching in ~/.apptainer, the environment variable APPTAINER_CACHEDIR can be used to select a different location such as a directory under /scratch. See also, apptainer build environment

Your $HOME is automatically available from containers run via Apptainer. You can pip3 install --user against a container’s python, setup virtual environments, or similar while using a containerized application. Just run the container’s /bin/bash to get an Apptainer > prompt (or use apptainer shell <container> for a quick look from a login node). Below is an srun example of that with TensorFlow:

$ srun \
 --mem=32g \
 --nodes=1 \
 --ntasks-per-node=1 \
 --cpus-per-task=16 \
 --partition=gpuA100x4-interactive \
 --account=account_name \    # <- match to a "Project" returned by the "accounts" command
 --gpus-per-node=1 \
 --gpus-per-task=1 \
 --gpu-bind=verbose,per_task:1 \
 --pty \
 apptainer run --nv --bind /projects/bbXX \
 /sw/external/NGC/tensorflow:22.06-tf2-py3 /bin/bash
# job starts ...
Apptainer> hostname
gpua068.delta.internal.ncsa.edu
Apptainer> which python  # the python in the container
/usr/bin/python
Apptainer> python --version
Python 3.8.10