Installed Software
Software on TGI RAILS is provided through the use of the Lmod module system. RAILS provides many common software packages through modules such as CUDA, MPI, and Python. Access to system installed compilers such as GCC, Intel and Nvidia are also provided this way. Modules make it easy for users to get setup and started quickly on development, data analysis, and scientific computing without the need to configure their environment manually.
If there is additional software that is not already provided on the system that you need or you need help installing software yourself, please open a service request ticket by sending email to help+tgi@ncsa.illinois.edu.
For single user or single project use cases the preference is for the user to install the software in their own userspace. For general installation requests TGI RAILS staff will review requests on a case by case basis.
Modules (Lmod)
The user environment is controlled using the modules environment management system. Modules may be loaded, unloaded, or swapped either on a command line or in your $HOME/.bashrc (.cshrc for csh ) shell startup file.
The module command is a user interface to the Lmod package.
The Lmod package provides for the dynamic
modification of the user’s environment via modulefiles (a modulefile contains the information
needed to configure the shell for an application). Modules are independent of the user’s shell, so
both tcsh and bash users can use the same commands to change the environment.
Command |
Description |
|---|---|
|
lists all available modules |
|
lists currently loaded modules |
|
display the available modules on the system one page at a time |
|
search for modules named foo |
|
help on module modulefile |
|
display information about modulefile |
|
load modulefile into current shell environment |
|
remove modulefile from current shell environment |
|
unload modulefile1 and load modulefile2 |
To include a particular software stack in your default environment for TGI RAILS login and compute nodes:
Log into a TGI RAILS login node.
Manipulate your modulefile stack until satisfied.
#. Run
module save; this will create a .lmod.d/default file that will be loaded on the TGI RAILS login or compute nodes on your next login or job execution.
Command |
Description |
|---|---|
|
save current modulefile stack to ~/.lmod.d/default |
|
save current modulefile stack to ~/.lmod.d/collection_name |
|
load ~/.lmod.d/default if it exists or System default |
|
load your ~/.lmod.d/collection_name |
|
reset your modulefiles to System default |
|
disable collection_name by adding collection_name~ |
|
list all your ~/.lmod.d/collections |
|
list collection_name modulefiles |
You can also see: User Guide for Lmod for more help on the module system.
List of currently available modules
------------------ /sw/spack/v1/modules/lmod/openmpi/4.1.6-4ukqgsw/gcc/11.4.0 -------------------
fftw/3.3.10 hdf5/1.14.3 osu-micro-benchmarks/7.3 parallel-netcdf/1.12.3
----------------------------- /sw/spack/v1/modules/lmod/gcc/11.4.0 ------------------------------
cuda/11.8.0 openmpi/4.1.6 python/3.11.6
---------------------------------------- /sw/modulefiles ----------------------------------------
StdEnv matlab/2024a scripts/script_paths
conda/pytorch/2.6.0 pkgs/anaconda3/23.11.0 user/license_file
conda/tensorflow/2.19.0 python/pytorch/2.2.0 user/user_paths
----------------------------- /usr/share/lmod/lmod/modulefiles/Core -----------------------------
lmod settarg
-------------------------------- /sw/spack/v1/modules/lmod/Core ---------------------------------
cmake/3.27.7 intel-oneapi-compilers/2023.0.0 nvhpc/23.9
gcc/11.4.0 intel-oneapi-compilers/2023.2.1
gcc/12.3.0 intel-oneapi-mkl/2023.2.0
Compilers
TGI RAILS provides the GCC 11.4 compiler by default through the gcc/11.4.0 module. Also provided by the system are the Intel oneAPI compilers and the NVIDIA HPC compilers through their respective modules.
For more on compiling your own software on RAILS, see the Programming Environment (Building Software) page.
Python
Python on RAILS is provided through a combination of modules and preconfigured conda environments, allowing users flexibility for a variety of workflows.
The primary Python module available is python/3.11.6, which serves as a base environment and can be used to create or bootstrap your own python environments. System Python installations on RAILS leverage the conda-forge channel as the package provider, enabling users to easily install and manage additional packages within their personal environments.
RAILS also provides a set of preconfigured Python environments via Conda modules. These include environments optimized for machine learning and scientific computing: - conda/pytorch/2.6.0 - conda/tensorflow/2.19.0
You can use these modules to get started on rails without installing your own python environment. The Rails team works to frequently update these modules with the latest packages.
Python and Containers
If you use python with containers, take care to use the python from the container and not the python from anaconda or one of its environments. The container’s python should be 1st in $PATH. You may –bind the anaconda directory or other paths into the container so that you can start your conda environments, but with the container’s python (/usr/bin/python).
older versions of python and modules
Anaconda Archive contains previous Anaconda versions. The bundles are not small, but using one from Anaconda would ensure that you get software that was built to work together at a point in time. If you require an older version of a python lib/module, we suggest looking back in time at the Anaconda site.
Installing your own Python environment
On RAILS, you may install your own python software stacks as needed. There are a few choices when customizing your python setup. You may use any of these methods with any of the python versions or instances described below (or you may install your own python versions):
venv (python virtual environment)
#. venv allows you to create isolated Python environments, each with its own set of installed packages. Use it to keep project dependencies separate and avoid conflicts between different projects.
-
#. Conda allows you to create isolated environments with different Python versions and packages, much like venv but provides more flexibility in supporting both Python and non-Python dependencies. It’s especially useful for managing complex project requirements, making it easy to reproduce software stacks and avoid dependency conflicts. See comparison for more details.
uv (uv is a fast, modern, and user-friendly Python package manager) `uv
#. uv is a fast, modern Python package manager designed for high performance and ease of use. Compared to conda or pip, uv installs packages much more quickly, uses less disk space, and is compatible with existing Python workflows. It provides features like workspace management and deterministic builds, making it a good choice for users who want speed and reliability without the overhead of conda environments.
pip3 : pip3 install –user <python_package>
useful when you need just 1 python environment per python version or instance
A couple examples using all of the above are shown at this site covering scikit-learn-intelex (an Intel accelerated scikit learn subset library for x86_64 architecture) : Scikit-learn GitHub repository
Containers
Container support on RAILS is provided by Apptainer.
Docker images can be converted to Apptainer sif format via the apptainer pull command. Commands can be run from within a container using the apptainer run command.
If you encounter $HOME quota issues with Apptainer caching in ~/.apptainer, the environment
variable APPTAINER_CACHEDIR can be used to select a different location such as a directory
under /scratch. See also, apptainer build environment
Your $HOME is automatically available from containers run via Apptainer. You can
pip3 install --user against a container’s python, setup virtual environments, or similar
while using a containerized application. Just run the container’s /bin/bash to get an Apptainer
> prompt (or use apptainer shell <container> for a quick look from a login node). Below is an
srun example of that with TensorFlow:
$ srun \
--mem=32g \
--nodes=1 \
--ntasks-per-node=1 \
--cpus-per-task=16 \
--partition=gpuA100x4-interactive \
--account=account_name \ # <- match to a "Project" returned by the "accounts" command
--gpus-per-node=1 \
--gpus-per-task=1 \
--gpu-bind=verbose,per_task:1 \
--pty \
apptainer run --nv --bind /projects/bbXX \
/sw/external/NGC/tensorflow:22.06-tf2-py3 /bin/bash
# job starts ...
Apptainer> hostname
gpua068.delta.internal.ncsa.edu
Apptainer> which python # the python in the container
/usr/bin/python
Apptainer> python --version
Python 3.8.10