New User Guide
Note
Start here if you’re new to HAL and HPC clusters. If you’re an adept HPC user and only want the HAL-specific basics, go to the Quick Start Guide.
Access the System
Log in for the first time using SSH.
Log in to HAL for the first time with one of the following
ssh
commands, in a terminal, to initialize your account. Replace<username>
with your NCSA username.ssh <username>@hal.ncsa.illinois.edu
ssh <username>@hal-login2.ncsa.illinois.edu
Log in with HAL OnDemand.
Warning
Before you log in to HAL OnDemand for the first time, you must log in to HAL via
ssh
to initialize your account.Log in to the HAL OnDemand dashboard with your NCSA username and password.
Go to HAL OnDemand for more information.
Interactive Jobs
Start an Interactive Job
You can start and interactive job on HAL using the Slurm srun
command or Slurm wrapper suite swrun
command.
srun
srun --partition=debug --pty --nodes=1 --ntasks-per-node=16 --cores-per-socket=4 \
--threads-per-core=4 --sockets-per-node=1 --mem-per-cpu=1200 --gres=gpu:v100:1 \
--time 01:30:00 --wait=0 --export=ALL /bin/bash
swrun
swrun -p gpux1
Keep Interactive Jobs Alive
Interactive jobs end when you disconnect from the login node either by choice or due to internet connection issues. To keep an interactive job alive, you can use a terminal multiplexer like tmux.
Set Up
Start tmux
on the login node before you start an interactive job.
tmux
Reconnect
When you get disconnected, reconnect to the login node and attach to the tmux
session with the following command:
tmux attach
If you have multiple session running:
Use the following command to list the session ids:
tmux list-session
Use the following command to attach to a session. Replace
<session_id>
with the session id you want to connect to.tmux attach -t <session_id>
Batch jobs
Submit Jobs with Original Slurm Command
#!/bin/bash
#SBATCH --job-name="demo"
#SBATCH --output="demo.%j.%N.out"
#SBATCH --error="demo.%j.%N.err"
#SBATCH --partition=gpu
#SBATCH --time=4:00:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=16
#SBATCH --sockets-per-node=1
#SBATCH --cores-per-socket=4
#SBATCH --threads-per-core=4
#SBATCH --mem-per-cpu=1200
#SBATCH --export=ALL
#SBATCH --gres=gpu:v100:1
srun hostname
Submit Jobs with Slurm Wrapper Suite
#!/bin/bash
#SBATCH --job-name="demo"
#SBATCH --output="demo.%j.%N.out"
#SBATCH --error="demo.%j.%N.err"
#SBATCH --partition=gpux1
#SBATCH --time=4
srun hostname
Submit a job with multiple tasks
#!/bin/bash
#SBATCH --job-name="demo"
#SBATCH --output="demo.%j.%N.out"
#SBATCH --error="demo.%j.%N.err"
#SBATCH --partition=gpux1
#SBATCH --time=4
mpirun -n 4 hostname &
mpirun -n 4 hostname &
mpirun -n 4 hostname &
mpirun -n 4 hostname &
wait
For detailed information about using Slurm on HAL, go to Running Jobs.