Status of an individual node
The following standard commands show information about processes and users on an individual node.
top
Shows an almost real-time view of a running system. top is terminated by pressing q. The view can be restricted to show processes belonging to a particular user by pressing u and typing the username. The update interval can be set by pressing d and typing the delay in seconds.ps
Prints a list of the current processes.
w
Shows who is logged on and what they are doing.uptime
Shows the system uptime.
who
Shows who is logged in.How to monitor and display job, queue, host and SLURM server status
Many of the commands described below have the option -l (lower case L) which gives more detailed information. For more information about all commands described here see their manual pages.
Job status
scontrol
scontrol show jobs | less
scontrol show jobs <slurm_job_id>
With the first scontrol command user can show detailed information about all running and very recently ended jobs of users.The second command shows this information only for a particular job.
squeue
Shows information about jobs and job steps. It
shows, for example, the nodes where each job is running.
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
8079 corander tcsh ruusvuor R 21:41 1 n241
8088 corander srun jppirhon R 0:07 1 n241
Status of the whole system
- sinfo -all
- A quick view on compute node states, including allocated and idle nodes.
Tue Mar 2 09:37:20 2010
PARTITION AVAIL TIMELIMIT JOB_SIZE ROOT SHARE GROUPS NODES STATE NODELIST
parallel up infinite 1-12 no NO all 240 idle n[1-240]
serial* up infinite 1 no FORCE all 240 idle n[1-240]
corander up infinite 1-infinite no NO corander,c 32 idle n[241-272]
benchmark up infinite 1-infinite no YES:4 all 272 idle n[1-272]
-
- smap
- A visually attractive view on the queueing system state. It shows the placement of all current jobs in the system.
┌─────────────────────────────────────────────────────────────────────────────────┐
│.................................................................................│
│.................................................................................│
│..............................................................................A..│
│............................. │
└─────────────────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────────────────┐
│Tue Mar 2 09:33:24 2010 │
│ID JOBID PARTITION USER NAME ST TIME NODES NODELIST │
│A 8079 corander ruusvuor tcsh R 00:00:12 1 n241 │
└─────────────────────────────────────────────────────────────────────────────────┘
sjstat
Similar to sinfo command but will show a nodes memory and CPU configurations
$ sjstat -c
-------------------------------------------------------------
Pool Memory Cpus Total Usable Free Other Traits
-------------------------------------------------------------
parallel 16000Mb 12 125 124 19
parallel 32000Mb 12 112 112 9
serial* 16000Mb 12 125 124 19
serial* 32000Mb 12 112 112 9
longrun 16000Mb 12 125 124 19
longrun 32000Mb 12 112 112 9
interacti 16000Mb 12 3 3 3