Diamond — Harnessing GPU Resources for Scientific Deep Learning

diamond_logo

Diamond (diamondhpc.ai) is a comprehensive web platform and orchestration layer designed to abstract the inherent complexities of High-Performance Computing (HPC) and Deep Learning (DL) workflows. By providing a unified graphical user interface across heterogeneous clusters, Diamond enables researchers and engineers to develop, debug, and deploy large-scale models while effortlessly sharing and reusing deep learning workload environments among collaborators to reduce redundant setup efforts.

Diamond is built on top of the Globus Compute framework, which provides a unified interface for managing compute resources across a variety of clusters.

For more information on Diamond, see Diamond’s documentation and eScience 2025 conference paper Diamond: Harnessing GPU Resources for Scientific Deep Learning.

Prerequisites

Before you begin, ensure you have the following in place:

Quickstart

  • Sign In: Navigate to diamondhpc.ai and log in via Globus Auth. Choose your preferred identity provider to get started.

  • Explore the Dashboard: Your command center for recent activities, active tasks, datasets, images, and endpoint status.

  • Activate Endpoints: Head to the Endpoints page. Find your required endpoint in the “Available” tab, add it to your managed list and ensure the endpoint status is Online.

  • Navigate: Use the sidebar menu to jump between building containers, transferring data, and monitoring tasks.

Key Features

Endpoints Management

Leverage Globus Compute endpoints to securely connect Diamond to your cluster, and execute remote functions across allocated resources.

  • Discovery: Add endpoints from the “Available” tab. Once added, Diamond automatically fetches account and partition info from the host machine (this may take a moment—thanks for your patience!).

  • Configuration: Managed endpoints appear in the “Managed” tab. Important: Set a Diamond Work Directory here to ensure all job logs are organized and saved in your preferred path.

  • Activation: Check the endpoint status and ensure your endpoint is live. It endpoint status is offline, login into the cluster and run:

    globus-compute-endpoint start <your-endpoint-name>
    

Note: If you are using the Delta / DeltaAI Multi-user Endpoint, you can skip this step as the endpoint is always online. Once you add it to your managed list and set a Diamond Work Directory, you can start using Diamond to submit jobs to the cluster.

Image Builder

A portable container solution enabling reproducible deep learning workflows.

  • Library: Access a mix of your private builds and public images hosted by Diamond.

  • Custom Builds: If you need a custom image, click “Build New Image” and follow the guided prompts to generate a custom Apptainer image for your workflow.

Datasets

Manage and explore datasets in a centralized registry - Diamond collections.

  • Registry: View all Diamond collections currently available to your account.

  • Globus Integration: Register new datasets directly as Globus collections to streamline high-speed data movement.

Tasks & Job Submission

Submit and manage compute jobs across HPC resources.

  • Submit: Click “Submit New Task” to launch a Slurm job. Our interface guides you through the configuration, so you don’t have to write batch scripts from scratch.

  • Real-time Monitoring: Once submitted, watch your progress in real-time. Diamond streams output and error logs directly to your browser.

Applications

The following applications are integrated with Diamond and available as containerized workloads:

  • OpenFold: Diamond provides a public container image of OpenFold hosted in Anvil. Users having Anvil allocation may submit jobs via the Tasks interface for protein structure prediction, fine-tuning, and inference workflows.

  • DeepSpeed-Chat: Diamond provides a public container image of DeepSpeed-Chat hosted in NCSA Delta. Supported use cases include interactive chat sessions and job submission via the Tasks interface for fine-tuning and batch inference.

Profile

Manage your identity and global preferences in one place.

  • Identity: Managed by Globus Auth to ensure secure and seamless access to your resources.