Running Jobs
The Illinois HTC system uses HTCondor for workload and job management. The basics of submitting and monitoring jobs, and system-specific configurations are outlined below.
Note
If you are new to HTCondor, the HTCondor quick start guide and submitting HTCondor jobs tutorial video are great resources to use to get started.
Job Submission
Review the HTCondor quick start guide - A first HTCondor job for an introduction into the HTCondor job submission process.
The Illinois HTC access point (AP) operating system is Red Hat Enterprise Linux Server release 7.9 (Maipo)
Submit Description File
Your submit description file must include requests for CPUs, memory, and disk. The table below outlines the default and maximum allowed values for these variables.
You should not automatically request the maximum values; you should set these variables based on the needs of your job. Underestimating could result in your job being placed on hold. Overestimating needlessly ties up resources that will not be available to other users.
Variable
Default Value
Maximum Value
request_cpus
1
48
request_memory
1G
180G
request_disk
40G
1TB
queue
1
31
Reminder that
queue
should be the last line in your submit description file, command lines after thequeue
line are ignored.Illinois HTC does not have a shared filesystem, therefore the
should_transfer_files
variable in your submit file should be set toYES
.should_transfer_files = YES
Implement basic self-checkpointing in case a job is kicked out of the queue by setting the
when_to_transfer_output
variable toON_EXIT_OR_EVICT
.when_to_transfer_output = ON_EXIT_OR_EVICT
Refer to the HTCondor documentation - condor_submit page for a complete list of submit description file options.
How to Monitor Jobs
The command to monitor your jobs in the queue is below. Jobs are summarized in batches and only jobs associated with your username are displayed.
condor_q
View your jobs not in batches with the
-nobatch
option.condor_q -nobatch
If a job is on hold, the job status will be listed as
H
. View a list of your “on hold” jobs and the reason why they are on hold with the-hold
option.condor_q -hold
Refer to the HTCondor documentation - condor_q page for a complete list of queue options.
How to Release or Remove Jobs in the Queue
Release an “on hold” job back into the idle queue with the
condor_release
command below. (ReplaceJobID
with the JobID you want to release.)condor_release JobID
Manually remove a job from the queue with the
condor_rm
command. (ReplaceJobID
with the JobID you want to remove.)condor_rm JobID