Please note, that listings of comands here are only examples. This is because the configuration changes from time to time and updating the listings should be done fairly often. They are similar in the present Louhi sysem but, for eaxmple, the queue names and certain properties are not valid now or anymore. You can get always the present listings by the showed commands. Especially the queuing system confoguration is now under progress and the final or stable configuration may be very different than it is now. See also chapter Submitting jobs: qsub.
Job status
qstat
Examples of the command qstat, which displays job information, are given here. For full description, see the manual page qstat(1B). For information about the options, see also the subsection Status of jobs, queues and batch server: qstat
qstat
Job id Name User Time Use S Queue
---------------- ---------------- ---------------- -------- - -----
78478.sdb prog128d20 user1 00:00:00 R smem-xt5
78574.sdb prog128d16 user1 00:00:00 R smem-xt5
78597.sdb my_prog user2 00:00:04 R small-xt5
78669.sdb simul user3 00:00:00 R smem-xt5
...
The output of the command qstat without options shows:
- Job id: the job identifier assigned by PBS
- Name: the job name given by the submitter or formed automatically
- User: the job owner
- Time Use: the CPU time used
- S: the job state:
E - job is exiting after having run
H - job is held
Q - job is queued, eligible to run or routed
R - job is running.
T - job is being moved to new location.
W - job is waiting for its execution time
(-a option) to be reached.
S - job is suspended. - Queue: the queue in which the job resides
See also the command cqstat below.
qstat -a
sdb:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
78478.sdb user1 smem-xt5 prog128d2 7480 1 1 -- 72:00 R 61:23
78574.sdb user1 smem-xt5 prog128d16 23778 1 1 -- 72:00 R 51:41
78597.sdb user2 small-xt my_prog 13761 1 1 -- 72:00 R 50:15
78669.sdb user3 smem-xt5 simul 23946 1 1 -- 72:00 R 46:35
....
The output of the command qstat -a (and also with any of the options -i, -r, -u, -n, -s, -G or -M) shows:
- Job ID: the job identifier assigned by PBS
- Username: the job owner
- Queue: the queue in which the job resides
- Jobname: the job name
- SessID: the session id (if the job is running)
- NDS: the number of chunks or nodes requested by the job (not shown correctly)
- TSK: the number of CPUs requested by the job (not shown correctly)
- Req'd Memory: amount of memory requested by the jobthe number of nodes requested by the job
- Req'd Time: either the CPU time, if specified, or wall time requested by the job, (hh:mm).
- S: the job's current state (see above)
- Elap Time: the amount of CPU time or wall time used by the job (hh:mm).
See the command cqstat described below. It shows the number of PEs (cores), the number of cores for each PE and its threads (depth) and the number of PEs per node.
qstat -u ...
Information of a job or jobs of a particular user can be displayed by commands of the type:
qstat -u user1 3779
qstat -u user1
The first command shows the job 3779 (sequence number) of the user user1 and the second one all jobs of that user.
qstat -f | less
qstat -f 3792 | less
The command qstat -f shows lots of information about all jobs and the form qstat -f 3792 about a particular job (here 3792). This includes, e.g., used resources until now:
resources_used.cpupercent = 0
resources_used.cput = 00:00:00
resources_used.mem = 6236kb
resources_used.ncpus = 1
resources_used.vmem = 29748kb
resources_used.walltime = 00:55:47
and resources reserved for the job:
Resource_List.mpparch = XT
Resource_List.mppmem = 950mb
Resource_List.mppnppn = 2
Resource_List.mppwidth = 256
Resource_List.ncpus = 1
Resource_List.nodect = 1
Resource_List.place = pack
Resource_List.select = 1
Resource_List.walltime = 04:00:00
cqstat
This is the filter script for the command qstat -f and shows selected parts of its plentiful display. The result contains some information which is not correctly shown by the command qstat -a (see above). Especially cqstat shows the number of PEs (cores in pure MPI) (MPP Width), the number of cores for each PE and its threads (depth) (MPP Depth) and the number of PEs per node (MPP NPPN) correctly:
Time MPP MPP MPP
Job ID User Jobname S Queue Queued Walltime Width Depth NPPN
----------- -------- -------- - -------- -------- -------- ----- ----- ----
78478.sdb user1 prog128 R smem-xt5 13:06:52 61:05:12 64 8
78574.sdb user1 prog128 R smem-xt5 03:25:20 51:23:47 64 8
78597.sdb user2 my_prog R small-xt5 01:58:57 49:57:25 256 8
78669.sdb user3 XXyz R smem-xt5 22:45:33 46:16:50 64 8
78704.sdb user4 simul R smem-xt5 20:43:28 42:57:19 128 8
...
Other commands
There are several commands starting with xt which show, among other things, information about jobs on computing nodes, see: Resource monitoring.
Queue status
Please note that the queue names, their configuration (e.g., maximum limits) and purpose may change from what is shown here. You can always find out the current configuration by the commands explained below.
qstat
qstat -Q
Queue Max Tot Ena Str Que Run Hld Wat Trn Ext Type
------------- ----- ----- --- --- ----- ----- ----- ----- ----- ----- ----
special 1 0 no yes 0 0 0 0 0 0 Exec
ops 0 0 no yes 0 0 0 0 0 0 Exec
workq 0 0 no yes 0 0 0 0 0 0 Exec
xt4lgmem 0 0 yes yes 0 0 0 0 0 0 Exec
xt4 0 6 yes yes 4 2 0 0 0 0 Exec
xt5 0 5 yes yes 2 3 0 0 0 0 Exec
all 0 0 yes yes 0 0 0 0 0 0 Exec
xt5smlmem 0 2 yes yes 2 0 0 0 0 0 Exec
xt5lgmem 0 0 yes yes 0 0 0 0 0 0 Exec
The non-existent queues special, ops and workq serve here as examples of disabled queues. The queue parallel is here a routing queue from which, e.g., the jobs in the hold state are moved to an executing queue depending on the resources requested.
The command qstat -Q shows the available queues and their status:
- Queue: the queue name
- Max: the maximum number of jobs that may be run in the queue concurrently (0 = not defined)
- Tot: the total number of jobs in the queue
- Ena: the enable (yes) or disabled (no) status of the queue
- Str: the started (yes) or stopped (no) status of the queue
- Que, Run, Hld, Wat, Trn, Ext (same as Q, R, H, W, T, E above, respectively):
for each job state, the name of the state and the number of jobs in the queue in that state.
- Type: the type of queue, execution or routing.
qstat -q
This commad shows the status of queues in an alternative format:
server: sdb
Queue Memory CPU Time Walltime Node Run Que Lm State
---------------- ------ -------- -------- ---- ----- ----- ---- -----
xt4lgmem -- -- 12:00:00 -- 0 0 -- E R
xt4 -- -- 12:00:00 -- 2 4 -- E R
xt5 -- -- 12:00:00 -- 3 2 -- E R
all -- -- -- -- 0 0 -- E R
xt5smlmem -- -- 12:00:00 -- 0 2 -- E R
xt5lgmem -- -- 12:00:00 -- 0 0 -- E R
----- -----
5 8
- Queue: the queue name
- Memory: the maximum amount of memory a job in the queue may request
- CPU Time the maximum amount of CPU time a job in the queue may request
- Walltime: the maximum amount of wall time a job in the queue may request
- Node: the maximum amount of nodes a job in the queue may request
- Run: the number of jobs in the queue in the running state
- Que: the number of jobs in the queue in the queued state
- Lm: the maximum number (limit) of jobs that may be run in the queue concurrently
- State: The state of the queue is given by a pair of letters:
- E R - Enabled Running (started)
- E S - Enabled Stopped
- D R - Disabled Running (started)
- D S - Disabled Stopped
The output shows that the maximum walltime has been defined for the executing queues, but no limits are set for maximum memory or CPU time.
qstat -Q -f | less
This shows the full status of all queues including default resources (limits for reasources) and other restrictions of recourses. For a specific queue, add the queue name, e.g., medium in Louhi:
qstat -Q -f xt5
Queue: xt5
queue_type = Execution
total_jobs = 5
state_count = Transit:0 Queued:2 Held:0 Waiting:0 Running:3 Exiting:0 Begun
:0
resources_max.mpparch = XT
resources_max.mppnodes = 384-479,768-991,1280-1375,1408-1503,1792-2015
resources_max.mppnppn = 8
resources_max.walltime = 12:00:00
resources_min.mpparch = XT
resources_min.mppnodes = 384-479,768-991,1280-1375,1408-1503,1792-2015
resources_default.mpparch = XT
resources_default.mppnppn = 8
resources_assigned.ncpus = 3
resources_assigned.nodect = 3
kill_delay = 10
enabled = True
started = True
This shows that there may be also minimum and maximum limits for certain resources. The maximum walltime in this queue is 12 hours. The default and maximum number of PEs are 8 (in XT4 queues these are 4). The listing shows also which nodes are available when the queue is used.
Server status
qstat
qstat -B
This shows the available PBS servers and their status:
Server Max Tot Que Run Hld Wat Trn Ext Status
---------------- ----- ----- ----- ----- ----- ----- ----- ----- -----------
sdb 0 13 8 5 0 0 0 0 Active
Because there is only one PBS Server on Louhi you can get its full status by either of the following commands
qstat -B -f
qsata -B sdb or qstat -B -f nid00003
Server: sdb
server_state = Active
server_host = nid00003
scheduling = True
total_jobs = 13
state_count = Transit:0 Queued:8 Held:0 Waiting:0 Running:5 Exiting:0 Begun
:0
managers = root@nid00008,root@boot001,ui14@nid00008
default_queue = parallel
log_events = 511
mail_from = root
query_other_jobs = True
resources_default.mppmem = 1000mb
resources_default.mppwidth = 128
resources_default.walltime = 04:00:00
default_chunk.ncpus = 1
resources_max.mppmem = 2000mb
resources_assigned.ncpus = 5
resources_assigned.nodect = 5
scheduler_iteration = 600
flatuid = True
FLicenses = 9463
resv_enable = True
node_fail_requeue = 310
max_array_size = 10000
pbs_license_file_location = 7788@nid00128
pbs_license_min = 0
pbs_license_max = 2147483647
pbs_license_linger_time = 3600
license_count = Avail_Global:9462 Avail_Local:1 Used:5 High_Use:6
pbs_version = PBSPro_9.2.2.82426
eligible_time_enable = False
The server resource limitations shown here applies to all queues, unless for queues themselves has not been set different limitations (see qstat -Q -f command above).
This shows that (when writing this) default wall time is four hours (4:00:00 or 14400 s, -l walltime=4:00:00). If you don't need that much, you should request shorter time, and if you need more, you must request more. Default memory for PE is 950 MB (-l mppmem=1000M). Maximum memory per PE is set here to 2000 MB, but settings of queues themselves may overwrite it. Again, if you need less, request less, and if you need more, request more.
General usage policy and queue arrangements are described in section Usage policy and obtaining a user id.