Access the Compute Nodes

Login (or head) nodes in a cluster computer are a few nodes for performing interactive user tasks, like editing files, setting up jobs, and compiling code. The compute (computational) nodes, on the other hand, are a much larger pool of nodes available for production computing. You use your allocation account(s) to allocate and use time on compute nodes on the cluster. This section is about how to do that.

When you ask for a set of resources to do computing, that grouping of resources, attached to an allocation account that you’re using to pay for them, is a “job”. A job is a temporary grouping of resources to do work on your behalf. The Slurm software manages your access to jobs and accounts for how much you’ve paid for them. This documentation section is how to use Slurm to create and manage jobs.

Partitions (Queues)

The compute nodes on each resource are grouped into partitions (queues) with different limitations for using compute nodes in that partition. The limitations include maximum number of nodes, maximum wall time, and memory that can be assigned to a single job. Typically there will be queues for development or debugging that have very short max job times, but you’ll be able to get nodes on them very quickly. There will also usually be queues with longer max times for production computing.

Consult the resource-specific documentation for more information on a system’s queues:

Access a Compute Node in a Running Job

Direct SSH access to a compute node in a running job from a login node is enabled once the job has started:

$ squeue --job jobid
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
             12345       cpu     bash   gbauer  R       0:17      1 cn001

Then in a terminal session:

$ ssh cn001
cn001.delta.internal.ncsa.edu (172.28.22.64)
  OS: RedHat 8.4   HW: HPE   CPU: 128x    RAM: 252 GB
  Site: mgmt  Role: compute
$

See also, Monitoring a Node During a Job.