Running Jobs

The Illinois HTC system uses HTCondor for workload and job management. The basics of submitting and monitoring jobs, and system-specific configurations are outlined below.

Note

If you are new to HTCondor, the HTCondor quick start guide and submitting HTCondor jobs tutorial video are great resources to use to get started.

Job Submission

Review the HTCondor quick start guide - A first HTCondor job for an introduction into the HTCondor job submission process.

The Illinois HTC access point (AP) operating system is Red Hat Enterprise Linux Server release 7.9 (Maipo)

Submit Description File

  • Your submit description file must include requests for CPUs, memory, and disk. The table below outlines the default and maximum allowed values for these variables.

    You should not automatically request the maximum values; you should set these variables based on the needs of your job. Underestimating could result in your job being placed on hold. Overestimating needlessly ties up resources that will not be available to other users.

    Submit Description File Resource Variable Defaults and Maximums

    Variable

    Default Value

    Maximum Value

    request_cpus

    1

    48

    request_memory

    1G

    180G

    request_disk

    40G

    1TB

    queue

    1

    31

  • Reminder that queue should be the last line in your submit description file, command lines after the queue line are ignored.

  • Illinois HTC does not have a shared filesystem, therefore the should_transfer_files variable in your submit file should be set to YES.

    should_transfer_files = YES
    
  • Implement basic self-checkpointing in case a job is kicked out of the queue by setting the when_to_transfer_output variable to ON_EXIT_OR_EVICT.

    when_to_transfer_output = ON_EXIT_OR_EVICT
    
  • Refer to the HTCondor documentation - condor_submit page for a complete list of submit description file options.

How to Monitor Jobs

  • The command to monitor your jobs in the queue is below. Jobs are summarized in batches and only jobs associated with your username are displayed.

    condor_q
    
  • View your jobs not in batches with the -nobatch option.

    condor_q -nobatch
    
  • If a job is on hold, the job status will be listed as H. View a list of your “on hold” jobs and the reason why they are on hold with the -hold option.

    condor_q -hold
    
  • Refer to the HTCondor documentation - condor_q page for a complete list of queue options.

How to Release or Remove Jobs in the Queue

  • Release an “on hold” job back into the idle queue with the condor_release command below. (Replace JobID with the JobID you want to release.)

    condor_release JobID
    
  • Manually remove a job from the queue with the condor_rm command. (Replace JobID with the JobID you want to remove.)

    condor_rm JobID