New User Guide

Note

Start here if you’re new to HAL and HPC clusters. If you’re an adept HPC user and only want the HAL-specific basics, go to the Quick Start Guide.

Access the System

  1. Set Up a User Account.

  2. Log in for the first time using SSH.

    Log in to HAL for the first time with one of the following ssh commands, in a terminal, to initialize your account. Replace <username> with your NCSA username.

    ssh <username>@hal.ncsa.illinois.edu
    
    ssh <username>@hal-login2.ncsa.illinois.edu
    
  3. Log in with HAL OnDemand.

    Warning

    Before you log in to HAL OnDemand for the first time, you must log in to HAL via ssh to initialize your account.

    Log in to the HAL OnDemand dashboard with your NCSA username and password.

    Go to HAL OnDemand for more information.

Interactive Jobs

Start an Interactive Job

You can start and interactive job on HAL using the Slurm srun command or Slurm wrapper suite swrun command.

srun

srun --partition=debug --pty --nodes=1 --ntasks-per-node=16 --cores-per-socket=4 \
--threads-per-core=4 --sockets-per-node=1 --mem-per-cpu=1200 --gres=gpu:v100:1 \
--time 01:30:00 --wait=0 --export=ALL /bin/bash

swrun

swrun -p gpux1

Keep Interactive Jobs Alive

Interactive jobs end when you disconnect from the login node either by choice or due to internet connection issues. To keep an interactive job alive, you can use a terminal multiplexer like tmux.

Set Up

Start tmux on the login node before you start an interactive job.

tmux

Reconnect

When you get disconnected, reconnect to the login node and attach to the tmux session with the following command:

tmux attach

If you have multiple session running:

  1. Use the following command to list the session ids:

    tmux list-session
    
  2. Use the following command to attach to a session. Replace <session_id> with the session id you want to connect to.

    tmux attach -t <session_id>
    

Batch jobs

Submit Jobs with Original Slurm Command

#!/bin/bash
#SBATCH --job-name="demo"
#SBATCH --output="demo.%j.%N.out"
#SBATCH --error="demo.%j.%N.err"
#SBATCH --partition=gpu
#SBATCH --time=4:00:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=16
#SBATCH --sockets-per-node=1
#SBATCH --cores-per-socket=4
#SBATCH --threads-per-core=4
#SBATCH --mem-per-cpu=1200
#SBATCH --export=ALL
#SBATCH --gres=gpu:v100:1

srun hostname

Submit Jobs with Slurm Wrapper Suite

#!/bin/bash
#SBATCH --job-name="demo"
#SBATCH --output="demo.%j.%N.out"
#SBATCH --error="demo.%j.%N.err"
#SBATCH --partition=gpux1
#SBATCH --time=4

srun hostname

Submit a job with multiple tasks

#!/bin/bash
#SBATCH --job-name="demo"
#SBATCH --output="demo.%j.%N.out"
#SBATCH --error="demo.%j.%N.err"
#SBATCH --partition=gpux1
#SBATCH --time=4

mpirun -n 4 hostname &
mpirun -n 4 hostname &
mpirun -n 4 hostname &
mpirun -n 4 hostname &

wait

For detailed information about using Slurm on HAL, go to Running Jobs.