Using Slurm
Submitting Jobs
Jobs are submitted to the slurm batch system by issuing the command
sbatch <SCRIPT>
where <SCRIPT> is a shell script which can contain additional parameters in the header to configure the job. All jobs are constrained to the requested amount of time, CPUs and memory.
Memory Limit
The default memory limits are set on purpose comparatively low. To find an appropriate limit for your job first submit one job requesting a large memory limit. You can check afterwards the actual memory usage of the finished job with the command
sacct -o MaxRSS -j JOBID
Walltime Limit
As for the memory limit the default walltime limit is also set to a quite short time. Please check in advance how long the job will run and set the time accordingly.
Example: Single-Core Job
The following script describes a simple job requesting one CPU core and 1GB of memory with a running time of 15 hours:
#!/bin/bash |
|
#SBATCH -o /home/myusername/output.%j.%N.log |
The standard output and error is written to the file specified. The file should not be located on our Lustre file system |
#SBATCH -p medium |
Run the job in the partition (queue) medium |
#SBATCH -n 1 |
Number of cores |
#SBATCH -t 15:00:00 |
Running time in hours |
#SBATCH --mem-per-cpu=1024 |
Request 1GB of memory per core |
#SBATCH --get-user-env |
Use environment variables of interactive login |
echo "I'm running on node $HOSTNAME" |
The actual script to be executed |
All options provided in the submission script can also be provided directly as parameters to sbatch
Examples: Multi-Core Job on One Node
The following sbatch options allow to submit a job requesting 1 task with 4 cores on one node. The overall requested memory on the node is 4GB:
sbatch -n 1 --cpus-per-task 4 --mem=4000 <SCRIPT>
The following sbatch
options allow to submit a job requesting
4 tasks each with 1 core on one node. The overall requested memory on the node is 4GB:
sbatch -n 4 --mem=4000 <SCRIPT>
The following option allows to avoid the use of in-core multi-threading. The command advices Slurm to only allocate one thread from each core to the job:
sbatch -n 4 --mem=4000 --hint=nomultithread <SCRIPT>
Job Scheduling
Job priorities are calculated using information about fairshare and the length of time a job is waiting in the queue. The most important factor is the fairshare. A detailed description of how the fairshare priority is calculated can be found here.
The longer your job is waiting for execution in the queue the higher its priority will grow. To check your current fairshare status you can user the following command
sshare -u <USERNAME>
To check your current job priority use the command
sprio -j <JOBID>
which will provide you with some details to the calculation of your job priority.
Comparison of Slurm and Torque Commands
The following table gives an overview of the available Slurm commands comparing them to Torque:
Task | Slurm Command | Torque Command |
Job submission |
sbatch <SCRIPT> |
qsub <SCRIPT> |
Submit a job for execution or initiate job steps in real time |
srun |
|
Allocate resources for a job in real time | ||
Cancel pending or running jobs |
scancel <JOBID> |
qdel <JOBID> |
List queued jobs |
squeue |
qstat |
Job status / Details |
scontrol show job <JOBID> |
qstat <JOBID> |
Job status by user |
squeue -u <USERNAME> |
qstat -u <USERNAME> |
List queues (partitions) |
sinfo |
qstat -Q |
List queue configuration details |
scontrol show partition |
qstat -Qf |
List nodes |
sinfo -N |
pbsnodes |
scontrol show nodes |
|
|
Gui |
smap (shell) |
|
sview (gui) |
Further Documentation
Detailed documentation is available from the Slurm web page: