Running Batch Jobs
Batch jobs are run by submitting a job script to the scheduler. The script contains the commands needed to set up your environment and run your application. Users typically create or edit job scripts using a text editor such as vi, emacs or nano.
To submit jobs on Peregrine, the Torque qsub command should be used:
% qsub <batch_file> -A <project-handle>
The job script may contain instructions for the job scheduler. These are preceeded with "#PBS". These may be used to specify resource limits such as wall clock time, number of nodes, etc. as well as what queue and what kind of nodes you want your job to run on. The same options may be provided as options to the qsub command. If so, command line options to qsub take precedence over similar options in the batch script.
All jobs must specify a project handle that is associated with a project allocation. This may be done with the -A option to qsub or as an option within the script. Jobs submitted without a project handle will be rejected. This prevents jobs with an incorrect spelling of the project handle from sitting in the queue indefinitely.
Frequently used options:
|option||what it does|
|-d execution directory||tells Torque where the job should execute|
|-I (capital i)||interactive job|
|-X||display X windows to my system (undocumented flag)|
|-l (lowercase L)||resource limits (see below for more information)|
|-V||export environment variables to batch job|
join stderr and stdout in one file
tells Torque what project allocation should be charged for the job's node-hour usage
submit job to a specific queue
A variety of environment variables are made available for use by your script.
- The environment variable $PBS_O_WORKDIR is set to the location the job was submitted from.
- $PBS_NODEFILE points to a file containing a list of nodes allocated to the job.
Resource limits, such as the number of nodes and the wall clock time, may be set with the -l option to qsub or #PBS -l in the job script.
-l nodes=n:ppn=X says the job needs n nodes and should place X processes on each node
-l walltime=DD:HH:MM:SS sets the wall clock limit for job
Peregrine has several types of compute nodes, which differ in the amount of memory and number of processor cores. The majority of the nodes have 24 Xeon cores and 32 GB of memory but some have 24 Xeon cores and 64 GB of memory, some have 16 Xeon cores with 32 GB of memory and others have 16 Xeon cores with 256 GB of memory.
Users may request nodes of a particular type using the "feature" option in the resource limit specification of the job. By default, jobs will use the first node type found that is consistent with the job request.
For 16 core nodes, use -lfeature=16core.
For 24 core nodes, use -lfeature=24core.
For large memory 24 core nodes, use -lfeature=64GB.
For very large memory 16 core nodes, use -lfeature=256GB.
If no feature is specified, the job will run on the first available node type.
If you request an inconsistent feature set (e.g., 16-core and 64GB), the job will not be scheduled and will remain queued indefinitely. If such a job is seen, an admin will attempt to contact the job owner to rectify the incompatibility and get the job running.
More information about different node types in Peregrine is available.
Applications should be run from the /scratch file system. Program executable files may reside in any file system but input and output files should be read from or written to the /scratch file system.
In order to meet the needs of different types of jobs, nodes on Peregrine are available through three different job queues.
- Nodes in the short queue are intended for jobs that run for up to 4 hours. A user may use up to 8 nodes from this queue at the same time.
- Nodes in the debug queue are intended for jobs that need fast turnaround for debugging problems. A user may use up to 4 nodes from this queue for up to 1 hour.
- 4 nodes associated with the debug queue have 256 GB of memory. If you request the debug queue and specify the 256GB feature, your job will land on these nodes. If you request the debug queue and don't specify the 256GB feature, your job may use those nodes if they are available.
- Interactive jobs may be run using nodes in any queue but the wait time should be shortest if they are submitted to the debug queue.
- If no queue is specified, the job will run in the default (batch) queue. Jobs in the default queue may run for up to 5 days.