The simplest way to monitor your HEC jobs is with the qstat command. Used on its own, this will output a list of all jobs currently running or waiting to run on the cluster:

job-ID prior name   user    state submit/start at     queue         slots ja-task-ID
------------------------------------------------------------------------------------
198 0.500 myjob testuser r 07/30/2013 15:30:17 serial@comp005 1 1
199 0.500 myjob testuser r 07/30/2013 15:30:17 serial@comp005 1 1
200 0.500 myjob testuser qw 07/30/2013 15:30:11 1 1


The output columns are fairly self-explanatory:

job-IDA number used to uniquely identify your job within the job scheduling system. Use this number when you want to terminate a job via the qdel command.
priorThe user's current job priority, based upon current and recent cluster utilisation.
nameThe job's name, as specified by the job submission script's -N directive.
userThe username of the job owner.
stateCurrent job status: r (running), t (in transfer) and qw (queued and waiting)
submit/start atFor waiting jobs: the time the job was submitted. For running the jobs: the time the job started running.
queueFor running jobs, the queue and compute node the job is running on.
slotsThe number of job slots the job is consuming (1 for serial jobs, greater than 1 for parallel jobs).
ja-task-IDA special field for task arrays.

 

The default action for qstat is to output basic information on just your own jobs. If you wish to all jobs on the system, then you can add an additional argument:

    qstat -u '*'
The list of all users' jobs is usually very long!

More information

 Resource Quotas

To ensure a fair share of the cluster, each user is capped by a set of resource quotas. Jobs submitted to the cluster are eligible to run provided they don’t cause the user’s resource usage to exceed their current quota. In cases where job start would cause the resource quota to be breached, the job is held waiting until the user resource usage has reduced by enough capacity to support it - typically by waiting for other running jobs to complete.

Currently two resource quotas are enforced:

Job slots have a quota of 350 (i.e. a user may have running jobs consuming a total of up to 350 job slots / cores).

Memory usage is capped at a total of 1.36TB (i.e. users may have running jobs totaling up to 1.36TB of memory reservations, which with a job slot quota of 350, averages 4GB per job slot). Please refer to Running large memory jobs on the HEC for an explanation of job memory reservation requests.

Resource quotas can be viewed using the qquota command:


wayland-2017% qquota

resource quota rule limit filter
--------------------------------------------------------
slotlimits/9 slots=160/2200 users testuser
memlimits/1 h_vmem=768.0G/1.36TB users testuser


Note that resource quotas are only visible when you have running jobs. If you have no running jobs, the qquota command generates no output.




 Job lifecycle

A job's lifecycle can be tracked via the state field in the qstat output. All jobs start with a status of qw (queued and waiting). If the cluster is busy, or the job has requested a resource which is currently fully utilised, then a job may spend some time in this state. Once an appropriate job slot is available, the job's status changes briefly to t (in transfer) and then r (running). When a job no longer appears on the qstat output, it has either finished or has been deleted.

A job in the state Eqw is in an error state.




 Email notification of job completion

You can receive email notification when your jobs complete by adding the following to your job submission command : -m e -M foo@example.com. Alternatively, you can add the following directives to your job submission script:

#$ -m e
#$ -M foo@example.com


The email will contain a summary of the resources used by your job:

Job 7157196 (mytestjob.com) Complete
User = testuser
Queue = parallel@comp04-03.private.dns.zone
Host = comp04-03.private.dns.zone
Start Time = 10/25/2017 12:59:51
End Time = 10/25/2017 13:15:16
User Time = 04:04:13
System Time = 00:00:45
Wallclock Time = 00:15:25
CPU = 04:04:59
Max vmem = 4.922G
Exit Status = 0

When applied to job arrays, the mailback option will result in a notification for every completed array element — a 10,000 element job array will result in 10,000 email notifications. To prevent overloading the mail system, job arrays with the mailback option set will be rejected at submission time.

If you'd like to be notified when a job array finishes, create a dummy job (i.e. one which does very little work) with the email notification commands above, and make it dependent on the completion of the job array by adding the command line arguments -hold_jid jobid, where jobid is the ID of the job array. This will cause the dummy job to wait until all elements of the specified job array have finished before it runs — it will then run for a few seconds, complete, and email you.




 Monitoring memory and CPU usage with qtop

The qstat command described above gives basic information about the status of a job. Sometimes though, it's useful to have a more detailed look at how well a job is running; for example, to see how large a program is when running, or to check that it hasn't stalled. On a single platform system, the top command provides a more in-depth view of process status. On the HEC, you can use the qtop command to collect and display the output from top on all compute nodes for all your currently submitted jobs.

Consider the following qstat output, from qstat -u testuser:

job-ID prior  name    user   state submit/start at    queue       slots ja-task-ID
----------------------------------------------------------------------------------
196739 0.0336 TESTJOB testuser r 05/14/2014 14:42:56 serial@comp001 1 300
196739 0.0336 TESTJOB testuser r 05/14/2014 14:52:11 serial@comp005 1 305
196739 0.0336 TESTJOB testuser r 05/14/2014 14:52:11 serial@comp003 1 306
196739 0.0336 TESTJOB testuser r 05/14/2014 14:52:11 serial@comp005 1 307

The user is currently running four batch serial jobs, on three different compute nodes. The jobs are part of single array job, so the job-ID field is the same for each, and the ja-task-ID field uniquely identifies each individual task in the array job.

We can generate relevant qtop information for these four jobs by running qtop -u testuser, which produces the following output:

  PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM    TIME+  COMMAND
comp001
13830 testuser 20 0 2106m 2.0g 3504 R 98.2 8.4 0:42.81 R
13668 testuser 20 0 111m 1756 1464 S 0.0 0.0 0:00.02 bash
comp003
25573 testuser 20 0 2028m 2.0g 3464 R 100.0 8.4 44:58.79 R
24675 testuser 20 0 111m 1752 1464 S 0.0 0.0 0:00.02 bash
comp005
32306 testuser 20 0 2206m 2.1g 3504 R 100.0 9.1 36:09.76 R
32549 testuser 20 0 2201m 2.1g 3504 R 100.0 9.1 7:46.11 R
32144 testuser 20 0 111m 1756 1464 S 0.0 0.0 0:00.02 bash
32387 testuser 20 0 111m 1756 1464 S 0.0 0.0 0:00.03 bash


The output fields for processes are identical to those for the standard linux top command executed in batch mode — see the man page for an in-depth description of the meaning of each field. This description will cover only the more relevant fields. Sets of processes are grouped so that all of a user's processes on a compute node appear together.

The first thing to note is that the information provided by qtop is very different from that of qstat. qtop is not an integrated part of the SGE system so it will output process information from each compute node, not job information — a single job will involve executing a number of processes on a compute node. You'll need to compare qtop and qstat output to work out just what's going on. For example, qtop doesn't give you the job-ID number, and it often lists two or more processes where qstat lists just one job.

The four most relevant fields in the output are labelled COMMAND, VIRT, RES and CPU.

The COMMAND field shows the name of the command being run by the process. Because jobs are submitted to the cluster as a job script the job script itself becomes a process, which is named after the chosen job shell (typically bash) in qtop's COMMAND field. The job shell typically consumes very little CPU — it's simply setting up the job's working environment and then calling the applications requested in the job submission script. For most purposes, you'll be interested in the other process(es) listed — typically the main process that your job script is currently running. In the above example the processes are all either the bash job scripts (one for each job) or a call to the R stats package — one of the applications available on the HEC.

The VIRT and RES fields give the total virtual and resident memory size of each process. Smaller process sizes are listed in (k)ilobytes, larger ones in (m)egabytes, or even (g)igabytes. The value in the VIRT field is the value used by SGE when assigning memory to jobs. Bear in mind that SGE adds up the size of all processes in a job — so the bash shell running the job script also counts towards this total. In the above example, all the main processes — those running the R stats package — have a total size of nearly 2.1 to 2.2 gigabytes.
As jobs which consume more than 0.5 gigabytes are classified as large memory jobs, the user has submitted these jobs with valued memory resource requirement to qsub, in order to ensure that jobs are placed on compute nodes with enough free memory to support them.

The other useful field in the qtop output is CPU, which describes how much of a single CPU the process is consuming. Typically a running serial job should be consuming very close to 100% of a CPU's resources. An MPI job will show multiple processes, each consuming around 100% CPU. OpenMP and other multi-threaded processes will show a single process entry consuming several hundred percent CPU — ideally 100 x the number of cores being used. Values considerably lower than these ideals will likely indicate some problem; the process might be spending a disproportionate amount of time performing file reads or writes; or in the case of badly balanced parallel programs one process might be idle while waiting for a communication from another process.

Note that the PID fields gives the process ID, not the Job ID. Each process on a linux system is assigned a unique process ID, which forms part of the standard output for top.




 Analysing resources used by completed jobs with qacct

You can review the resources used by completed jobs using the qacct command. For example:

qacct -j jobid


which will output the resources used for specified Job ID. To see a list of all recent jobs, you can run:

qacct -o username -j


Most of the output fields are self-explanatory, and a full description can be found in the accounting man page. Note that the maximum memory used by the job is recorded in the output field labelled maxvmem.

The job logs are rotated on the first of each month, so records are only available for the current calendar month. Older records are available on request.



Related pages