Diamond — Harnessing GPU Resources for Scientific Deep Learning
Diamond (diamondhpc.ai) is a comprehensive web platform and orchestration layer designed to abstract the inherent complexities of High-Performance Computing (HPC) and Deep Learning (DL) workflows. By providing a unified graphical user interface across heterogeneous clusters, Diamond enables researchers and engineers to develop, debug, and deploy large-scale models while effortlessly sharing and reusing deep learning workload environments among collaborators to reduce redundant setup efforts.
Diamond is built on top of the Globus Compute framework, which provides a unified interface for managing compute resources across a variety of clusters.
For more information on Diamond, see Diamond’s documentation and eScience 2025 conference paper Diamond: Harnessing GPU Resources for Scientific Deep Learning.
Prerequisites
Before you begin, ensure you have the following in place:
Cluster Allocation: You must have a valid allocation on an HPC resource (such as NCSA Delta). If you don’t have one, visit ACCESS to set up your account.
Globus Compute Endpoint:
Use the Delta Multi-user Endpoint or DeltaAI Multi-user Endpoint for a quick start.
Alternatively, follow the Globus Compute documentation to set up a personal endpoint on the cluster.
Quickstart
Sign In: Navigate to diamondhpc.ai and log in via Globus Auth. Choose your preferred identity provider to get started.
Explore the Dashboard: Your command center for recent activities, active tasks, datasets, images, and endpoint status.
Activate Endpoints: Head to the Endpoints page. Find your required endpoint in the “Available” tab, add it to your managed list and ensure the endpoint status is Online.
Navigate: Use the sidebar menu to jump between building containers, transferring data, and monitoring tasks.
Key Features
Endpoints Management
Leverage Globus Compute endpoints to securely connect Diamond to your cluster, and execute remote functions across allocated resources.
Discovery: Add endpoints from the “Available” tab. Once added, Diamond automatically fetches account and partition info from the host machine (this may take a moment—thanks for your patience!).
Configuration: Managed endpoints appear in the “Managed” tab. Important: Set a Diamond Work Directory here to ensure all job logs are organized and saved in your preferred path.
Activation: Check the endpoint status and ensure your endpoint is live. It endpoint status is offline, login into the cluster and run:
globus-compute-endpoint start <your-endpoint-name>
Note: If you are using the Delta / DeltaAI Multi-user Endpoint, you can skip this step as the endpoint is always online. Once you add it to your managed list and set a Diamond Work Directory, you can start using Diamond to submit jobs to the cluster.
Image Builder
A portable container solution enabling reproducible deep learning workflows.
Library: Access a mix of your private builds and public images hosted by Diamond.
Custom Builds: If you need a custom image, click “Build New Image” and follow the guided prompts to generate a custom Apptainer image for your workflow.
Datasets
Manage and explore datasets in a centralized registry - Diamond collections.
Registry: View all Diamond collections currently available to your account.
Globus Integration: Register new datasets directly as Globus collections to streamline high-speed data movement.
Tasks & Job Submission
Submit and manage compute jobs across HPC resources.
Submit: Click “Submit New Task” to launch a Slurm job. Our interface guides you through the configuration, so you don’t have to write batch scripts from scratch.
Real-time Monitoring: Once submitted, watch your progress in real-time. Diamond streams output and error logs directly to your browser.
Applications
The following applications are integrated with Diamond and available as containerized workloads:
OpenFold: Diamond provides a public container image of OpenFold hosted in Anvil. Users having Anvil allocation may submit jobs via the Tasks interface for protein structure prediction, fine-tuning, and inference workflows.
DeepSpeed-Chat: Diamond provides a public container image of DeepSpeed-Chat hosted in NCSA Delta. Supported use cases include interactive chat sessions and job submission via the Tasks interface for fine-tuning and batch inference.
Profile
Manage your identity and global preferences in one place.
Identity: Managed by Globus Auth to ensure secure and seamless access to your resources.
