Access the Compute Nodes
Login (or head) nodes in a cluster computer are a few nodes for performing interactive user tasks, like editing files, setting up jobs, and compiling code. The compute (computational) nodes, on the other hand, are a much larger pool of nodes available for production computing. You use your allocation account(s) to allocate and use time on compute nodes on the cluster. This section is about how to do that.
When you ask for a set of resources to do computing, that grouping of resources, attached to an allocation account that you’re using to pay for them, is a “job”. A job is a temporary grouping of resources to do work on your behalf. The Slurm software manages your access to jobs and accounts for how much you’ve paid for them. This documentation section is how to use Slurm to create and manage jobs.
Partitions (Queues)
The compute nodes on each resource are grouped into partitions (queues) with different limitations for using compute nodes in that partition. The limitations include maximum number of nodes, maximum wall time, and memory that can be assigned to a single job. Typically there will be queues for development or debugging that have very short max job times, but you’ll be able to get nodes on them very quickly. There will also usually be queues with longer max times for production computing.
Consult the resource-specific documentation for more information on a system’s queues:
Access a Compute Node in a Running Job
Direct SSH access to a compute node in a running job from a login node is enabled once the job has started:
$ squeue --job jobid
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
12345 cpu bash gbauer R 0:17 1 cn001
Then in a terminal session:
$ ssh cn001
cn001.delta.internal.ncsa.edu (172.28.22.64)
OS: RedHat 8.4 HW: HPE CPU: 128x RAM: 252 GB
Site: mgmt Role: compute
$
See also, Monitoring a Node During a Job.