System Architecture

DeltaAI is designed to support applications needing GPU computing power and access to larger memory. DeltaAI has some important architectural features to facilitate new discovery and insight:

  • A single CPU architecture (ARM) and GPU architecture in the NVIDIA GH200 (Grace Hopper superchip).

  • A low latency and high bandwidth HPE/Cray Slingshot interconnect between compute nodes.

  • Lustre for home, projects, and scratch file systems.

  • Support for relaxed and non-POSIX I/O (feature not yet implemented)

  • Shared-node jobs with the smallest allocatable unit being 1 GH200 superchip.

  • Resources for persistent services in support of Gateways, Open OnDemand, and Data Transport nodes

DeltaAI GH200 Compute Nodes

The Delta compute ecosystem is composed of a single node type:

  • A quad or 4-way Grace-Hopper node based on the GH200 superchip.

  • Each superchip has a connected Slingshot11 Cassini NIC totaling 4 NICs per node.

Each Grace-Hopper GH200 superchip has a Grace ARM 72-core CPU with 120 GB of memory and a NVIDIA H100 GPU with 96GB of memory.

NVIDIA GH200 Grace-Hopper superchip.

4-Way NVIDIA GH200 GPU Compute Node Specifications

4-Way GH200 GPU Grace Hopper superchip Compute Node Specs

Specification

Value

Number of nodes

114

GPU

NVIDIA H100

GPUs per node

4 (1 per superchip)

GPU Memory (GB)

96

CPU

NVIDIA Grace

CPU sockets per node

4

Cores per socket

72 (1 superchip)

Cores per node

288

Hardware threads per core

1 (SMT off)

Hardware threads per node

288

Clock rate (GHz)

~ 3.35

CPU RAM (GB)

GPU (Gb)

480GB (120GB per CPU) LPDDR5

384GB (96 GB per GPU) HBM3

Cache (MiB) L1/L2/L3

18 / 288 / 456

Local storage (TB)

3.9 TB

NIC (4 per node)

4x200GbE

The Grace ARM CPUs have 1 NUMA domain per superchip.

References

4-Way NVIDIA GH200 Mapping and GPU-CPU Affinitization

4-Way A100 Mapping and Affinitization

GPU0

GPU1

GPU2

GPU3

HSN

CPU Affinity

NUMA Affinity

GPU NUMA ID

GPU0

X

NV6

NV6

NV6

hsn0

0-71

0

4

GPU1

NV6

X

NV6

NV6

hsn1

72-143

1

12

GPU2

NV6

NV6

X

NV6

hsn2

144-215

2

20

GPU3

NV6

NV6

NV6

X

hsn3

216-287

3

28

HSN

hsn0

hsn1

hsn2

hsn3

X

Table Legend:

Legend:

  • X = Self

  • SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)

  • NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node

  • PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)

  • PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)

  • PIX = Connection traversing at most a single PCIe bridge

  • NV# = Connection traversing a bonded set of # NVLinks

GPU NUMA ID - this is the new feature of the GH200 where the CPU and GPU can see the memory domains. - the domains are also visible via the numactl command.

numactl --show
policy: default
preferred node: current
physcpubind
cpubind: 0 1 2 3
nodebind: 0 1 2 3
membind: 0 1 2 3 4 12 20 28

Login Nodes

Login nodes provide interactive support for code compilation, job submission, and interactive salloc/srun. They do not contain GPUs. See DeltaAI Login Methods for more information.

Specialized Nodes

Delta supports data transfer nodes (serving the “NCSA Delta” Globus collection) and nodes in support of other services.

Network

DeltaAI is connected to the NPCF core router and exit infrastructure via two 100Gbps connections, NCSA’s 400Gbps+ of WAN connectivity carry traffic to/from users on an optimal peering.

DeltaAI resources are inter-connected with HPE/Cray’s 200Gbps Slingshot 11 interconnect.

File Systems

Warning

There are no backups or snapshots of the DeltaAI file systems (internal or external). You are responsible for backing up your files. There is no mechanism to retrieve a file if you have removed it, or to recover an older version of any file or data.

Note

For more information on the DeltaAI file systems, including paths and quotas, go to Data Management - File Systems.

Users of DeltaAI have access to three file systems at the time of system launch, a fourth relaxed-POSIX file system will be made available at a later date.

DeltaAI

The DeltaAI storage infrastructure provides users with their HOME, PROJECTS and WORK areas. These file systems are mounted across all DeltaAI nodes and are accessible on the DeltaAI DTN Endpoints. The HOME file system runs on a NCSA center-wide VAST system. The PROJECTS (see below) file system is provided by the NCSA Taiga center-wide, Lustre based file system. The WORK file systems run Lustre and have both a HDD (HardDisk Drive) and a NVME SSD (Non-Volatile Memory Express SolidState Drive) each with individual quotas.

Hardware

Under Construction.

Taiga

Taiga is NCSA’s global file system which provides users with their $PROJECT area. This file system is mounted across all Delta systems at /taiga (note that Taiga is used to provision the Delta /projects file system from /taiga/nsf/delta) and is accessible on both the Delta and Taiga DTN endpoints. For NCSA and Illinois researchers, Taiga is also mounted across NCSA’s HAL, HOLL-I, and Radiant compute environments. This storage subsystem has an aggregate performance of 110GB/s and 1PB of its capacity is allocated to users of the Delta system. /taiga is a Lustre file system running DDN’s EXAScaler 6 Lustre stack. See the Taiga documentation for more information.

Note

A “module reset” in a job script populates $WORK and $SCRATCH environment variables automatically, or you may set them as WORK=/projects/<account>/$USER, SCRATCH=/scratch/<account>/$USER.