Running Jobs
Accessing the Compute Nodes
User access to the compute nodes for running jobs is available via a batch job. The Campus Cluster uses the Slurm Workload Manager for running batch jobs. See the Batch Commands section for details on batch job submission.
Please be aware that the interactive (login/head) nodes are a shared resource for all users of the system and their use should be limited to editing, compiling and building your programs, and for short non-intensive runs.
Note
User processes running on the interactive (login/head) nodes are killed automatically if they accrue more than 30 minutes of CPU time or if more than 4 identical processes owned by the same user are running concurrently.
An interactive batch job provides a way to get interactive access to a compute node via a batch job. See the srun section for information on how to run an interactive job on the compute nodes.
To ensure the health of the batch system and scheduler, users should refrain from having more than 1,000 batch jobs in the queues at any one time.
See the Running Serial Jobs section for information on expediting job turnaround time for serial jobs.
See the Using MATLAB / Running Mathematica Batch Jobs sections for information on running MATLAB and Mathematica on the Campus Cluster.
Running Programs
On successful building (compilation and linking) of your program, an executable is created that is used to run the program. The table below describes how to run different types of programs.
Program Type |
How to Run the Program/Executable |
Example Command |
---|---|---|
Serial |
To run serial code, specify the name of the executable. |
|
MPI |
MPI programs are run with the of the executable. The total number of MPI processes is the {number of nodes} x {cores/nodes} set in the batch job resource specification. |
|
OpenMP |
The the number of threads used by OpenMP programs. If this variable is not set, the number of threads used defaults to one under the intel compiler. Under GCC, the default behavior is to use one thread for each core available on the node. To run OpenMP programs, specify the name of the executable. |
|
MPI/OpenMP |
As with OpenMP programs, the variable can be set to specify the number of threads used by the OpenMP portion of the mixed MPI/OpenMP program. The same default behavior applies with respect to the number of threads used. Use the mixed MPI/OpenMP programs. The number of MPI processes per node is set in the batch job resource specification for number of cores/node. |
|
Queues
Primary Queues
Each investor group has unrestricted access to a dedicated primary queue with concurrent access to the number and type of nodes in which they invested.
Users can view the partitions(queues) that they have the ability to submit batch jobs to, with the following command:
[cc-login1 ~]$ sinfo -s -o "%.25R %.12l %.12L %.5D"
Users can also view specific configuration information about the compute nodes associated with their primary partition(s), with the following command (replace <partition_name>
with the name of the partition):
[cc-login1 ~]$ sinfo -p <partition_name> -N -o "%.8N %.4c %.25P %.9m %.12l %.12L %G"
Secondary Queues
One of the advantages of the Campus Cluster Program is the ability to share resources. A shared secondary queue will allow users access to any idle nodes in the cluster. Users must have access to a primary queue to be eligible to use the secondary queue.
While each investor has full access to the number and type of nodes in which they invested, those resources not fully utilized by each investor will become eligible to run secondary queue jobs. If there are resources eligible to run secondary queue jobs but there are no jobs to be run from the secondary queue, jobs in the primary queues that fit within the constraints of the secondary queue may be run on any otherwise appropriate idle nodes. The secondary queue uses fairshare scheduling.
Queue |
Max Walltime |
Max # Nodes |
---|---|---|
secondary |
4 hours |
305 |
secondary-Eth |
4 hours |
21 |
Jobs are routed to the secondary queue when a queue is not specified. i.e., the secondary queue is the default queue on the Campus Cluster.
The difference between secondary and “secondary-Eth” queues is the compute nodes associated with the secondary queue are interconnected via InfiniBand (IB) and the compute nodes that are associated with the “secondary-Eth” queue are interconnected via Ethernet. Currently Ethernet is slower than InfiniBand, but this only matters in terms of performance if users have batch jobs that use multiple nodes and need to communicate between nodes (like with MPI codes) or for jobs with heavy file system I/O requirements.
Batch Commands
Below are brief descriptions of the primary batch commands. For more detailed information, refer to the individual man pages.
sbatch
Batch jobs are submitted through a job script using the sbatch
command.
Job scripts generally start with a series of SLURM directives that describe requirements of the job such as number of nodes and wall time required, to the batch system/scheduler. SLURM directives can also be specified as options on the sbatch
command line; command line options take precedence over those in the script.
The rest of the batch script consists of user commands.
Sample batch scripts are available in the directory /projects/consult/slurm
.
The syntax for sbatch
is:
sbatch [list of sbatch options] script_name
The main sbatch
options are listed below. See the sbatch
man page for more options.
‑‑account=account_name
account_name
is the name of an account available to you. If you don’t know the account(s) available to you, ask your technical representative or submit a support request.‑‑time=time
time
is the maximum wall clock time (d-hh:mm:ss) [default: maximum limit of the queue(partition) submitted to]‑‑nodes=n
n
is the number of 16/20/24/28/40/128-core nodes [default: 1 node]‑‑ntasks=p
Total number of cores for the batch job.
p
is how many cores (ntasks) per job or per node (ntasks-per-node) to use (1 through 40) [default: 1 core]‑‑ntasks-per-node=p
Number of cores per node (same as ppn under PBS).
p
is how many cores (ntasks) per job or per node (ntasks-per-node) to use (1 through 40) [default: 1 core]
Example:
--account=account_name # <- replace "account_name" with an account available to you --time=00:30:00 --nodes=2 --ntasks=32or
--account=account_name # <- replace "account_name" with an account available to you --time=00:30:00 --nodes=2 --ntasks-per-node=16
Memory needs
For investors that have nodes with varying amounts of memory or to run in the secondary queue, nodes with a specific amount of memory can be targeted. The compute nodes have memory configurations of 64GB, 128GB, 192GB, 256GB or 384GB. Not all memory configurations are available in all investor queues.
For a list of all the nodes you have access to, with information about CPUs and memeory, execute:
sinfo -N -l
You can also check with the technical representative of your investor group to determine what memory configurations are available for the nodes in your primary queue.
Warning
Do not use the memory specification unless absolutely required since it could delay scheduling of the job; also, if nodes with the specified memory are unavailable for the specified queue the job will never run.
Example:
‑‑account=account_name # <- replace "account_name" with an account available to you ‑‑time=00:30:00 ‑‑nodes=2 ‑‑ntask=32 ‑‑mem=118000or
‑‑account=account_name # <- replace "account_name" with an account available to you ‑‑time=00:30:00 ‑‑nodes=2 ‑‑ntasks-per-node=16 ‑‑mem-per-cpu=7375
Specifying nodes with GPUs
To run jobs on nodes with GPUs, add the resource specification TeslaM2090 (for Tesla M2090), TeslaK40M (for Tesla K40M), K80 (for Tesla K80), P100 (for Tesla P100), V100 (for Tesla V100), TeslaT4 (for Tesla T4) or A40 (for Tesla A40) if your primary queue has nodes with multiple types of GPUs, nodes with and without GPUs or if you are submitting jobs to the secondary queue. Through the secondary queue any user can access the nodes that are configured with any of the specific GPUs.
Example:
‑‑gres=gpu:V100or
‑‑gres=gpu:V100:2to specify two V100 GPUs (default is 1 if no number is specified after the gpu type).
Note
Requesting more GPUs than what is available on a single compute node will result in a failed batch job submission.
To determine if GPUs are available on any of the compute nodes in your group’s partition(queue), run the below command (replace <partition_name>
with the name of the partition) or check with the technical representative of your investor group.
sinfo -p <partition_name> -N -o "%.8N %.4c %.16G %.25P %50f"
Useful Batch Job Environment Variables
Description |
SLURM Environment Variable |
Detail Description |
PBS Environment Variable (no longer valid) |
---|---|---|---|
JobID |
$SLURM_JOB_ID |
Job identifier assigned to the job. |
$PBS_JOBID |
Job Submission Directory |
$SLURM_SUBMIT_DIR |
By default, jobs start in the directory the job was submitted from. So the command is not needed. |
$PBS_O_WORKDIR |
Machine(node) list |
$SLURM_NODELIST |
Variable name that contains the list of nodes assigned to the batch job. |
$PBS_NODEFILE |
Array JobID |
$SLURM_ARRAY_JOB_ID $SLURM_ARRAY_TASK_ID |
Each member of a job array is assigned a unique identifier (see the Job Arrays section). |
$PBS_ARRAYID |
See the sbatch
man page for additional environment variables available.
srun
The srun
command initiates an interactive job on the compute nodes.
For example, the following command will run an interactive job in the “ncsa” queue with a wall clock limit of 30 minutes, using one node and 16 cores per node. The compute time will be charged to the “account_name” account.
[cc-login1 ~]$ srun -A account_name --partition=ncsa --time=00:30:00 --nodes=1 --ntasks-per-node=16 --pty /bin/bash
You can also use other sbatch
options such as those documented above.
After you enter the command, you will have to wait for SLURM to start the job. As with any job, your interactive job will wait in the queue until the specified number of nodes is available. If you specify a small number of nodes for smaller amounts of time, the wait should be shorter because your job will backfill among larger jobs. You will see something like this:
srun: job 123456 queued and waiting for resources
Once the job starts, you will see the below and will be presented with an interactive shell prompt on the launch node:
srun: job 123456 has been allocated resources
At this point, you can use the appropriate command to start your program.
When you are done with your runs, you can use the exit
command to end the job.
squeue
SLURM Example Command |
Command Description |
---|---|
|
List the status of all jobs on the system. |
|
List the status of all your jobs in the batch system. |
|
List nodes allocated to a running job in addition to basic information. |
|
List detailed information on a particular job. |
|
List summary information on all the queues. |
See the man page for other options available.
scancel
The scancel
command deletes a queued job or kills a running job.
scancel JobID
deletes/kills a job.
Job Dependencies
SLURM job dependencies allow users to set execution order in which their queued jobs run.
Job dependencies are set by using the ‑‑dependency
option with the syntax being ‑‑dependency=<dependency type>:<JobID>
.
SLURM places the jobs in Hold state until they are eligible to run.
The following are examples on how to specify job dependencies using the afterany
dependency type, which indicates to SLURM that the dependent job should become eligible to start only after the specified job has completed.
On the command line:
[cc-login1 ~]$ sbatch --dependency=afterany:<JobID> jobscript.sbatch
In a job script:
#!/bin/bash
#SBATCH --time=00:30:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=16
#SBATCH --account=account_name # <- replace "account_name" with an account available to you
#SBATCH --job-name="myjob"
#SBATCH --partition=secondary
#SBATCH --output=myjob.o%j
#SBATCH --dependency=afterany:<JobID>
In a shell script that submits batch jobs:
#!/bin/bash
JOB_01=`sbatch jobscript1.sbatch |cut -f 4 -d " "`
JOB_02=`sbatch --dependency=afterany:$JOB_01 jobscript2.sbatch |cut -f 4 -d " "`
JOB_03=`sbatch --dependency=afterany:$JOB_02 jobscript3.sbatch |cut -f 4 -d " "`
...
Generally, the recommended dependency types to use are after
, afterany
, afternotok
, and afterok
.
While there are additional dependency types, those types that work based on batch job error codes may not behave as expected because of the difference between a batch job error and application errors.
See the dependency section of the sbatch
manual page for additional information (man sbatch
).
Job Constraints
Use the --constraint
option to specify required features for a job. Refer to the Slurm srun --constraint documentation for more details. (You can also find the same information in the Slurm sbatch documentation and Slurm salloc documentation.)
Features available on Campus Cluster include:
CPU type (AE7713, E2680V4, G2348, …)
GPU type (NoGPU, P100, K80, …)
Memory (64G, 128G, 256G, 512G, …)
Interconnect (E1G, E10G, FDR, HDR, …)
Run the sinfo
command below to see a full list of features for nodes that are in queues that you can submit to:
sinfo -N --format="%R (%N): %f" -S %R | more
If a constraint(s) cannot be satisfied, your job will not run and squeue
will return BadConstraints
; refer to the Slurm squeue documentation.
Job Arrays
If a need arises to submit the same job to the batch system multiple times, instead of issuing one sbatch
command for each individual job, users can submit a job array.
Job arrays allow users to submit multiple jobs with a single job script using the ‑‑array
option to sbatch
.
An optional slot limit
can be specified to limit the number of jobs that can run concurrently in the job array.
See the sbatch
manual page for details (man sbatch
).
The file names for the input, output, and so on, can be varied for each job using the job array index value defined by the SLURM environment variable SLURM_ARRAY_TASK_ID
.
A sample batch script that makes use of job arrays is available in /projects/consult/slurm/jobarray.sbatch
.
A few things to keep in mind:
Valid specifications for job arrays are:
‑‑array 1-10
‑‑array 1,2,6-10
‑‑array 8
‑‑array 1-100%5
(a limit of 5 jobs can run concurrently)You should limit the number of batch jobs in the queues at any one time to 1,000 or less (each job within a job array is counted as one batch job.)
Interactive batch jobs are not supported with job array submissions.
To delete job arrays, see the scancel command section.
Running Serial Jobs
Users often have a number of single-core (serial) jobs that need to be run. Since the Campus Cluster nodes have multiple cores (16/20/24/28/40/56 cores on a node), using these resources efficiently means running multiple of these batch jobs on a single node. This can be done with Multiple Batch Jobs (one per serial process) or combined within a Single Batch Job.
Keep in mind, memory needs to decide how many serial processes to run concurrently on a node. If you are running your jobs in the secondary queue, also be aware that the compute nodes on the Campus Cluster have different amounts of memory. To avoid overloading the node, make sure that the memory required for multiple jobs or processes can be accommodated on a node. Assume that approximately 90% of the memory on a node is available for your job (the remaining is needed for system processes).
Multiple Batch Jobs
The queue --mem-per-cpu
option in Slurm can be utilized to submit serial jobs to run concurrently on a node. Starting with the a memory amount of 3375 megabytes and a node specification of 1
(--nodes
), along with a specification of 1
for the ntasks per node (--ntasks-per-node
) will allow jobs owned by the same user to share a node. Multiple serial jobs submitted will then be scheduled to run concurrently on a node.
The following Slurm specification will schedule multiple serial jobs on one node:
#!/bin/bash
#SBATCH --time=00:05:00 # Job run time (hh:mm:ss)
#SBATCH --account=account_name # Replace "account_name" with an account available to you
#SBATCH --nodes=1 # Number of nodes
#SBATCH --ntasks-per-node=1 # Number of task (cores/ppn) per node
#SBATCH --mem-per-cpu=3375 # Memory per core (value in MBs)
<other sbatch options>
#
cd ${SLURM_SUBMIT_DIR}
# Run the serial executable
./a.out < input > output
Increase the --mem-per-cpu
value for each job will cause fewer jobs get scheduled concurrently on a single compute node resulting in more memory available to each job.
The above sbatch specifications are based on a the smallest compute node configuration. A compute node configured with 16 cores and 64GB memory (54000 MB usable).
Single Batch Job
Specify the maximum value for --ntasks-per-node
as an sbatch option/specification for a batch job and execute multiple serial processes within a one-node batch job. This works best if all the processes are estimated to take approximately the same amount of time to complete because the batch job waits to exit until all the processes finish. The basic template for the job script would be:
#!/bin/bash
#SBATCH --time=00:05:00 # Job run time (hh:mm:ss)
#SBATCH --account=account_name # Replace "account_name" with an account available to you
#SBATCH --nodes=1 # Number of nodes
#SBATCH --ntasks-per-node=16 # Number of task (cores/ppn) per node
#SBATCH --job-name=multi-serial_job # Name of batch job
#SBATCH --partition=secondary # Partition (queue)
#SBATCH --output=multi-serial.o%j # Name of batch job output file
executable1 &
executable2 &
.
.
.
executable16 &
wait
The ampersand (&) at the end of each command indicates the process will be backgrounded and allows 16 processes to start concurrently. The wait command at the end is important so the shell waits until the background processes are complete (otherwise the batch job will exit right away). The commands can also be handled in a do loop depending on the specific syntax of the processes.
When running multiple processes in a job, the total number of processes should generally not exceed the number of cores. Also be aware of memory needs so as not to run a node out of memory.
The following example batch script runs 16 instances of Matlab concurrently:
#!/bin/bash
#SBATCH --time=00:30:00 # Job run time (hh:mm:ss)
#SBATCH --account=account_name # Replace "account_name" with an account available to you
#SBATCH --nodes=1 # Number of nodes
#SBATCH --ntasks-per-node=16 # Number of task (cores/ppn) per node
#SBATCH --job-name=matlab_job # Name of batch job
#SBATCH --partition=secondary # Partition (queue)
#SBATCH --output=multi-serial.o%j # Name of batch job output file
cd ${SLURM_SUBMIT_DIR}
module load matlab
for (( i=1; i<=16; i++))
do
matlab -nodisplay -r num.9x2.$i > output.9x2.$i &
done
wait
Using MATLAB
Introduction
MATLAB (MATrix LABoratory) is a high-level language and interactive environment for numerical computation, visualization, and programming. Developed by MathWorks, MATLAB allows you to analyze data, develop algorithms, and create models and applications.
MATLAB is available on the Campus Cluster along with a collection of toolboxes all of which are covered by a campus concurrent license.
Versions
The table below list the versions of MATLAB installed on the Campus Cluster.
Version |
Release Name |
---|---|
MATLAB 9.7 |
2019b |
MATLAB 9.5 |
2018b |
MATLAB 9.4 |
2018a |
Adding MATLAB to Your Environment
Each MATLAB installation on the Campus Cluster has a module that you can use to load a specific version of MATLAB into your user environment.
You can see the available versions of MATLAB by typing module avail matlab
on the command line.
The latest version of MATLAB can be loaded into your environment by typing module load matlab
.
To load a specific version, you will need to load the corresponding module.
See the Managing Your Environment (Modules) section for more information about modules.
The MATLAB modules make the corresponding MATLAB product as well as all the installed toolboxes available to the user environment.
To verify which toolboxes are available (and the MATLAB version), type ver
at the prompt of an interactive MATLAB session.
Running MATLAB Batch Jobs
Execution of MATLAB should be restricted to compute nodes that are part of a batch job. For detailed information about running jobs on the Campus Cluster, see Running Jobs.
Standard batch job
A sample batch script that runs a single MATLAB task with a single m-file
is available in /projects/consult/slurm/matlab.sbatch
that you can copy and modify for your own use. Submit the job with:
[cc-login1 ~]$ sbatch matlab.sbatch
Interactive batch job
For the GUI (which will display on your local machine), use the -x11
option with the srun
command. Replace account_name
with the name of an account available to you. If you don’t know the account(s) available to you, ask your technical representative or submit a support request.
srun -A account_name --x11 --export=All --time=00:30:00 --nodes=1 --cpus-per-task=16 --partition=secondary --pty /bin/bash
Once the batch job starts, you will have an interactive shell prompt on a compute node. Then type:
module load matlab
matlab
An X-Server must be running on your local machine with X11 forwarding enabled within your SSH connection in order to display X-Apps, GUIs, and so on, back on your local machine.
Generally, users on Linux-based machines only have to enable X11 forwarding by passing an option (
-X
or-Y
) to the SSH command.Users on Windows machines will need to ensure that their SSH client has X11 forwarding enabled and have an X-Server running.
A list of SSH clients (which includes a combo packaged SSH client and X-Server) can be found in the SSH Clients section.
Additional information about running X applications can be found on the Using the X Window System page.
For the command line interface:
srun -A account_name --export=All --time=00:30:00 --nodes=1 --cpus-per-task=16 --partition=secondary --pty /bin/bash
(Replace account_name
with the name of an account available to you. If you don’t know the account(s) available to you, ask your technical representative or submit a support request.)
Once the batch job starts, you will have an interactive shell prompt on a compute node. Then type:
module load matlab
matlab -nodisplay
Parallel MATLAB
The Parallel Computing Toolbox (PCT) lets you solve computationally and data-intensive problems using multicore processors. High level constructs, parallel for loops, special array types, and parallelized numerical algorithms let you parallelize MATLAB applications without MPI programming. Under MATLAB versions 8.4 and earlier, this toolbox provides 12 workers (MATLAB computational engines) to execute applications locally on a single multicore node of the Campus Cluster. Under MATLAB version 8.5 the number of workers available is equal to the number of cores on a single node (up to a maximum of 512). See MATLAB Errors for error messages generated when violating this limit.
When submitting multiple parallel MATLAB jobs on the Campus Cluster a race condition to write temporary MATLAB job information to the same location can occur if two or more jobs start at the same time. This race condition can cause one or more of the parallel MATLAB jobs fail to use the parallel functionality of the toolbox. See MATLAB Errors for error messages generated when this occurs. Note that non-parallel MATLAB jobs do not suffer from this race condition.
To avoid this behavior, the start times of the parallel MATLAB jobs can be staggered by submitting each subsequent job to the batch system with the -W depend=after:JobID
option (see the Job Dependencies section for more information about this option).
sbatch parallel.ML-job.sbatch
sbatch --dependency=after:JobID.01 parallel.ML-job.sbatch
sbatch --dependency=after:JobID.02 parallel.ML-job.sbatch
...
sbatch --dependency=after:JobID.NN parallel.ML-job.sbatch
Note
The MATLAB Distributed Computing Server (MDCS) is not installed on the Campus Cluster because the latest versions of MDCS are not covered under the campus concurrent license. Therefore, MATLAB jobs are restricted to the parallel computing functionality of MATLAB’s Parallel Computing Toolbox.
The UI WebStore offers MDCS for release 2010b - you can contact them directly at webstore@illinois.edu for information and download instructions (for use with release 2010b only - it is not compatible for use with other versions).
MATLAB matlabpool is no longer available
The matlabpool
function is not available in MATLAB version 8.5(R2015a).
The parpool
function should be used instead.
Additional information can be found in the Parallel Computing Toolbox release notes.
MATLAB Errors
The following are some example errors encountered when running MATLAB on the Campus Cluster using MATLAB versions <= 8.4 and versions >= 8.5.
Trying to start a matlabpool (or parpool) with more than 12 workers
MATLAB version <= 8.4: Error message generated when trying to start a
matlabpool
with more than 12 workersmatlabpool('open', 24) >> Starting matlabpool using the 'local' profile ... stopped. Error using matlabpool (line 144) Failed to open matlabpool. (For information in addition to the causing error, validate the profile 'local' in the Cluster Profile Manager.) Caused by: Error using distcomp.interactiveclient/start (line 88) Failed to start matlabpool. This is caused by: You requested a minimum of 24 workers, but only 12 workers are allowed with the Local cluster.
MATLAB version >= 8.5: Error message generated when trying to start a
parpool
with more than 12 workersparpool('local', 24) >> Starting parallel pool (parpool) using the 'local' profile ... Error using parpool (line 103) You requested a minimum of 24 workers, but the cluster "local" has the NumWorkers property set to allow a maximum of 12 workers. To run a communicating job on more workers than this (up to a maximum of 512 for the Local cluster), increase the value of NumWorkers property for the cluster. The default value of NumWorkers for a Local cluster is the number of cores on the local machine.
Trying to start a matlabpool (or parpool) with 12 workers using 2 nodes and 6 ppn/node
MATLAB version <= 8.4: Error message generated when trying to start a
matlabpool
with 12 workers using 2 nodes and 6 ppn/nodematlabpool('open', 12) >> Starting matlabpool using the 'local' profile ... Error using matlabpool (line 148) Failed to start a parallel pool. (For information in addition to the causing error, validate the profile 'local' in the Cluster Profile Manager.) Caused by: Error using parallel.internal.pool.InteractiveClient/start (line 326) Failed to start pool. Error using parallel.Job/submit (line 304) You requested a minimum of 12 workers, but the cluster "local" has the NumWorkers property set to allow a maximum of 6 workers. To run a communicating job on more workers than this (up to a maximum of 12 for the Local cluster), increase the value of the NumWorkers property for the cluster. The default value of NumWorkers for a Local cluster is the number of cores on the local machine.
MATLAB version >= 8.5: Error message generated when trying to start a
parpool
with 12 workers using 2 nodes and 6 ppn/nodeparpool('local', 12) >> Starting parallel pool (parpool) using the 'local' profile ... Error using parpool (line 103) You requested a minimum of 12 workers, but the cluster "local" has the NumWorkers property set to allow a maximum of 6 workers. To run a communicating job on more workers than this (up to a maximum of 512 for the Local cluster), increase the value of the NumWorkers property for the cluster. The default value of NumWorkers for a Local cluster is the number of cores on the local machine.
When 2 or more parallel MATLAB jobs start at the same time (see the Parallel MATLAB section for details)
MATLAB version <= 8.4:
Example 1
Error using matlabpool (line 144) Failed to open matlabpool. (For information in addition to the causing error, validate the profile 'local' in the Cluster Profile Manager.) Caused by: Error using distcomp.interactiveclient/start (line 88) Failed to start matlabpool. This is caused by: A communicating job must have a single task defined before submission.
Example 2
Error using matlabpool (line 144) Failed to open matlabpool. (For information in addition to the causing error validate the profile 'local' in the Cluster Profile Manager.) Caused by: Error using distcomp.interactiveclinet/start (line 88) Failed to start matlabpool. This is caused by: Can't write file /home//.matlab/local_cluster_jobs/R2012a/Job2.in.mat.
MATLAB version >= 8.5:
Example 1
>>Starting parallel pool (parpool) using the 'local' profile ... Error using parpool (line 103) Failed to start a parallel pool. (For information in addition to the causing error, validate the profiles 'local' in the Cluster Profile Manager.) Caused by: Error using parallel.internal.pool.InteractiveClient>iThrowWithCause (line 667) Failed to start pool. Error using parallel.Job/createTask (line 277) Only one task may be created on a communicating Job.
Example 2
>> Starting parallel pool (parpool) using the 'local' profile ... Error using parpool (line 103) Failed to start a parallel pool. (For information in addition to the causing error, validate the profile 'local' in the Cluster Program Manager.) Caused by: Error using parallel.internal.pool.InteractiveClient>iThrowWithCause (line 667) Failed to start pool. Error using parallel.Cluster/createCommunicatingJob (line 92) The storage metadata file is corrupt. Please delete all files in the JobStorageLocation and try again.
Running Mathematica Batch Jobs
Standard batch job
A sample batch script that runs a Mathematica script is available in /projects/consult/slurm/mathematica.sbatch
.
You can copy and modify this script for your own use. Submit the job with:
[cc-login1 ~]$ sbatch mathematica.sbatch
In an interactive batch job
For the GUI (which will display on your local machine), use the –x11
option with the srun
command. (Replace account_name
with the name of an account available to you. If you don’t know the account(s) available to you, ask your technical representative or submit a support request.)
srun -A account_name --x11 --export=All --time=00:30:00 --nodes=1 --ntasks-per-node=16 --partition=secondary --pty /bin/bash
Once the batch job starts, you will have an interactive shell prompt on a compute node. Then type:
module load mathematica
mathematica
An X-Server must be running on your local machine with X11 forwarding enabled within your SSH connection in order to display X-Apps, GUIs, and so on, back on your local machine.
Generally, users on Linux-based machines only have to enable X11 forwarding by passing an option (
-X
or-Y
) to the SSH command.Users on Windows machines will need to ensure that their ssh client has X11 forwarding enabled, and an X-Server is running.
A list of SSH clients (which includes a combo packaged SSH client and X-Server) can be found in the SSH Clients section.
Additional information about running X applications can be found on the Using the X Window System page.
For the command line interface:
srun -A account_name --export=All --time=00:30:00 --nodes=1 --ntasks-per-node=16 --partition=secondary --pty /bin/bash
(Replace account_name
with the name of an account available to you. If you don’t know the account(s) available to you, ask your technical representative or submit a support request.)
Once the batch job starts, you will have an interactive shell prompt on a compute node. Then type:
module load mathematica
math