Access the Compute Nodes

Login (or head) nodes in a cluster computer are a few nodes for performing interactive user tasks, like editing files, setting up jobs, and compiling code. The compute (computational) nodes, on the other hand, are a much larger pool of nodes available for production computing. You use your allocation account(s) to allocate and use time on compute nodes on the cluster. This section is about how to do that.

When you ask for a set of resources to do computing, that grouping of resources, attached to an allocation account that you’re using to pay for them, is a “job”. A job is a temporary grouping of resources to do work on your behalf. The Slurm software manages your access to jobs and accounts for how much you’ve paid for them. This documentation section is how to use Slurm to create and manage jobs.

Your Allocations

You must be attached to an allocation account to have access to any system, so on any system, you will have access to at least one allocation account, but possibly more than one. You should only charge resources to the allocation account that you’re doing the work for. (“Account” is unfortunately used for two different things in HPC. This section will attempt to disambiguate them. Your login account is unique to you as a person, and is your credential to log into a system at NCSA. Your login account will be either your NCSA kerberos username, password, and Duo 2-factor, or else your UIUC campus NetID, password, and Duo 2-factor. Your allocation account is like a bank account; it contains a balance that you spend by using resources. You may have one allocation account, or you many have multiple if you’re attached to more than one project.)

To obtain a list of the allocation(s) that you have authorization to use, on any NCSA system, run the “accounts” command. This will give you a list of the project codes (each corresponding to an allocation account) and on many systems, this will also list the current allocation balance in that allocation account.

Generally, when slurm commands or arguments ask for an “account”, they mean your allocation account, not your login account (your username). You as a person only have one login account (and slurm knows who you are from your shell environment), but you may have multiple allocations that you do work for, so slurm needs to which allocation you’re attached to to charge for the job resources you’re requesting when you start a job.

Partitions (Queues)

The compute nodes on each resource are grouped into partitions (queues) with different limitations for using compute nodes in that partition. The limitations include maximum number of nodes, maximum wall time, and memory that can be assigned to a single job. Typically there will be queues for development or debugging that have very short max job times, but you’ll be able to get nodes on them very quickly. There will also usually be queues with longer max times for production computing.

Consult the resource-specific documentation for more information on a system’s queues:

Access a Compute Node in a Running Job

Direct SSH access to a compute node in a running job from a login node is enabled once the job has started:

$ squeue --job jobid
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
             12345       cpu     bash   gbauer  R       0:17      1 cn001

Then in a terminal session:

$ ssh cn001
cn001.delta.internal.ncsa.edu (172.28.22.64)
  OS: RedHat 8.4   HW: HPE   CPU: 128x    RAM: 252 GB
  Site: mgmt  Role: compute
$

See also, Monitoring a Node During a Job.