Vuori User's Guide > Using the Vuori cluster > Resource monitoring (system and user)
Tehdyt toimenpiteet

Resource monitoring (system and user)

This section describes the system and user level resource monitoring.

Status of an individual node

The following standard commands show information about processes and users on an individual node.

top

Shows an almost real-time view of a running system. top is terminated by pressing q. The view can be restricted to show processes belonging to a particular user by pressing u and typing the username. The update interval can be set by pressing d and typing the delay in seconds.

ps

Prints a list of the current processes.

w

Shows who is logged on and what they are doing.

uptime

Shows the system uptime.

who

Shows who is logged in.

How to monitor and display job, queue, host and SLURM server status

Many of the commands described below have the option -l (lower case L) which gives more detailed information. For more information about all commands described here see their manual pages.

Job status

scontrol

scontrol show jobs | less

scontrol show jobs <slurm_job_id>

With the first scontrol command user can show detailed information about all running and very recently ended jobs of users.The second command shows this information only for a particular job.

squeue

Shows information about jobs and job steps. It shows, for example, the nodes where each job is running.

  JOBID PARTITION     NAME     USER  ST       TIME  NODES NODELIST(REASON)
   8079  corander     tcsh ruusvuor   R      21:41      1 n241
   8088  corander     srun jppirhon   R       0:07      1 n241

Status of the whole system

sinfo -all
A quick view on compute node states, including allocated and idle nodes. 
Tue Mar  2 09:37:20 2010
PARTITION AVAIL  TIMELIMIT   JOB_SIZE ROOT SHARE     GROUPS  NODES       STATE NODELIST
parallel up infinite 1-12 no NO all 240 idle n[1-240]
serial* up infinite 1 no FORCE all 240 idle n[1-240]
corander up infinite 1-infinite no NO corander,c 32 idle n[241-272]
benchmark up infinite 1-infinite no YES:4 all 272 idle n[1-272]

smap
A visually attractive view on the queueing system state. It shows the placement of all current jobs in the system.
┌─────────────────────────────────────────────────────────────────────────────────┐
│.................................................................................│
│.................................................................................│
│..............................................................................A..│
│.............................                                                    │
└─────────────────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────────────────┐
│Tue Mar  2 09:33:24 2010                                                         │
│ID JOBID   PARTITION USER     NAME      ST      TIME NODES NODELIST              │
│A  8079    corander  ruusvuor tcsh      R   00:00:12     1 n241                  │
└─────────────────────────────────────────────────────────────────────────────────┘

sjstat

Similar to sinfo command but will show a nodes memory  and CPU configurations


$ sjstat -c

-------------------------------------------------------------
Pool        Memory  Cpus  Total Usable   Free  Other Traits
-------------------------------------------------------------
parallel   16000Mb    12    125    124     19
parallel   32000Mb    12    112    112      9
serial*    16000Mb    12    125    124     19
serial*    32000Mb    12    112    112      9
longrun    16000Mb    12    125    124     19
longrun    32000Mb    12    112    112      9
interacti  16000Mb    12      3      3      3