PyTorch on DeltaAI

Summary

  • Run PyTorch in the NGC container.

  • pip install into conda or venv environments.

    • DeltaAI only has one working invocation, please follow this documentation to run PyTorch.

  • The module should support Open OnDemand and Jupyter without any issues.

Run PyTorch

There are currently two options to run PyTorch on DeltaAI:

Use the NGC Container

Located here: /sw/user/NGC_containers/pytorch_24.09-py3.sif. The container should run without warnings. The following is example PyTorch container output.

job is starting on gh004

=============
== PyTorch ==
=============

NVIDIA Release 24.09 (build 100464920)
PyTorch Version 2.4.0a0+3bcc3cd
Container image Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Copyright (c) 2014-2024 Facebook Inc.
Copyright (c) 2011-2014 Idiap Research Institute (Ronan Collobert)
Copyright (c) 2012-2014 Deepmind Technologies    (Koray Kavukcuoglu)
Copyright (c) 2011-2012 NEC Laboratories America (Koray Kavukcuoglu)
Copyright (c) 2011-2013 NYU                      (Clement Farabet)
Copyright (c) 2006-2010 NEC Laboratories America (Ronan Collobert, Leon Bottou, Iain Melvin, Jason Weston)
Copyright (c) 2006      Idiap Research Institute (Samy Bengio)
Copyright (c) 2001-2004 Idiap Research Institute (Ronan Collobert, Samy Bengio, Johnny Mariethoz)
Copyright (c) 2015      Google Inc.
Copyright (c) 2015      Yangqing Jia
Copyright (c) 2013-2016 The Caffe contributors
All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

NOTE: CUDA Forward Compatibility mode ENABLED.
  Using CUDA 12.5 driver version 555.42.06 with kernel driver version 535.129.03.
  See https://docs.nvidia.com/deploy/cuda-compatibility/ for details.

CUDA device? True
HIP device:  True None
99 3161.41015625
199 1145.439208984375
299 424.538330078125

pip Install into a Conda or venv Environment

The following is the pip install command that will work to install run PyTorch.

pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu124

PyTorch with miniforge3

The following is an example using pip install with an miniforge3 module that DeltaAI admins have set up.

module use /sw/user/modules/python
module load python/miniforge3_pytorch
time srun \
  python3 tensor_gpu.py


CUDA device? True
HIP device:  True None
99 5007.66064453125
199 1793.055908203125
299 651.1307983398438