Vuori User's Guide > Using the Vuori cluster > Running programs
Tehdyt toimenpiteet

Running programs (batch and interactive)

This section gives a short introduction on how to run programs on Vuori.

Login node jobs

On a login node (vuori.csc.fi) a user can run only small test programs without the scheduler (SLURM). For example, a small serial  can be run by:

./my_serial_test_executable

Some programs might need a purpose-built environment, i.e., a module file. Typically, a module file contains instructions that alter or set shell environment variables, such as PATH to enable access to various installed software or libraries.

To view the modulefiles that are currently loaded in your environment
module list

All the available modulefiles can be seen

module avail

If the program needs access to specific shared libraries that are defined in a modulefile load the modulefile before running the executable. You do not have to do this if the modulefile is already loaded by yourself or by default. Below is an example

module load modulefilename  


Compute node jobs

On the login node  a job is submitted to SLURM, which places the job in a queue and allows it to run when the necessary resources become available on compute nodes.

IMPORTANT: All files needed by a job must be copied to $WRKDIR, for example the program and input/output files. Remember to give module load modulefilename commands if needed. The limits (max number of cores, max runtime) of the interactive sessions can be found from bottom of this page.

Interactive serial job

salloc -p interactive -n 1 -t 02:00:00            (allocates the resources)
srun ./my_serial_executable (launch the job)
exit (exit the allocation)

Options:
-n number of processes (number of cores)
-t running time, wallclock, format hh:mm:ss (hours:minutes:seconds)

Other way (one-liner):
salloc -p interactive -n 1 -t 02:00:00 srun ./my_serial_executable

If an application has a command line interface (like gdb debugger) the next
example will start a pseudo terminal on a computing node
and after the resource allocation the debugging
session can be launched normally

salloc -p interactive -n 1 -t 02:00:00 srun --pty $SHELL
gdb ./my_program
exit

Other way (one-liner):
salloc -p interactive -n 1 -t 02:00:00 srun --pty gdb ./my_program
 
SEE ALSO
man salloc, man srun


Interactive Non-MPI parallel job

salloc -n 4 --mem-per-cpu=1000 -t 01:30:00 -p interactive
srun ./my_executable
exit

Options:
-n number of processes (number of cores)
-t running time, wallclock, format hh:mm:ss (hours:minutes:seconds)
--mem-per-cpu per process memory limit (MB) (example 1GB = 1000 MB)

Other way (one-liner):
salloc -n 4 --mem-per-cpu=1000 -t 01:30:00 -p interactive srun ./my_executable

SEE ALSO
man salloc, man srun

Interactive MPI-parallel job

salloc -n 24 --ntasks-per-node=12 --mem-per-cpu=1000 -t 00:30:00 -p parallel
srun ./my_MPI_executable
exit

Options:
-n number of proceses (number of cores)
--ntasks-per-node On Vuori there is 12 cores per node. This way your job will distributed so that the number nodes is minimized
-t running time, wallclock, format hh:mm:ss (hours:minutes:seconds)
--mem-per-cpu per process memory limit (MB)

Other way (one-liner):
salloc -n 24 --ntasks-per-node=12 --mem-per-cpu=1000 -t 00:30:00 -p parallel srun ./my_MPI_executable

SEE ALSO
man salloc, man srun

Submitting serial or parallel batch jobs

A serial or parallel batch job is submitted using sbatch:

sbatch my_job_script

SEE ALSO
man sbatch
All files needed by a job are in $WRKDIR, for example the program and input/output files. $WRKDIR is available on all nodes and always points to the same location within the cluster.

Remember to include module load modulefilename  commands in a script if needed. By loading all needed modulefiles makes sure that the environment is always a correct one.

Serial batch job

#!/bin/csh
###
### serial job script example
###

## name of your job
#SBATCH -J my_jobname

## system error message output file
#SBATCH -e my_output_err_%j

## system message output file
#SBATCH -o my_output_%j

## a per-process (soft) memory limit
## limit is specified in MB
## example: 1 GB is 1000
#SBATCH --mem-per-cpu=1000

## how long a job takes, wallclock time hh:mm:ss
#SBATCH -t 01:01:00

## number of proceses
#SBATCH -n 1

## run my executable
srun ./my_serial_program

Parallel batch job

#!/bin/csh
###
### parallel job script example
###

## name of your job
#SBATCH -J my_jobname

## system error message output file
#SBATCH -e my_output_err_%j

## system message output file
#SBATCH -o my_output_%j

## a per-process (soft) memory limit
## limit is specified in MB
## example: 1 GB is 1000
#SBATCH --mem-per-cpu=1000

## how long a job takes, wallclock time hh:mm:ss
#SBATCH -t 11:01:00

##the number of processes (number of cores)
#SBATCH -n 24

##parallel queue
#SBATCH -p parallel

## run my MPI executable
srun ./my_mpi_program


OpenMP and hybrid OpenMP/MPI jobs

Use the option --cpus-per-task (or -c for short) with the commands salloc and sbatch to allocate cores for threads. The following commands both reserve four cores for the program:

salloc --cpus-per-task=4 srun ./my_openmp_app
salloc -c 4 srun ./my_openmp_app

The environment variable OMP_NUM_THREADS specifies the number of OpenMP threads. By default there will be one thread per core. On Vuori that is 12 threads. To match with the previous allocation, one would set:

setenv OMP_NUM_THREADS 4

The following commands will both run a hybrid OpenMP/MPI job. They allocate four tasks and six cores per task for the program:

salloc --ntasks=4 --cpus-per-task=6 srun ./my_hybrid_app
salloc -n 4 -c 6 srun ./my_hybrid_app

Binding threads to cores

The runtime libraries on Vuori support core affinity. Core affinity binds a thread to particular cores. In general, this improves performance. The binding is controlled with compiler specific environment variables as follows:

PGI

setenv MP_BIND yes

The value of MP_BIND must be set to yes. Otherwise all threads run in a node are run only in one core. The default of MP_BIND is no.

PathScale
setenv PSC_OMP_AFFINITY TRUE
setenv PSC_OMP_AFFINITY_GLOBAL TRUE

The default of PSC_OMP_AFFINITY is TRUE. So it is not necessary to set it again.

GCC
setenv GOMP_CPU_AFFINITY "0-11"
Setting the variable is necessary. Otherwise all threads within a node will run only in one core.


For more information, see Chapter Shared memory parallelization.



Queue/partition limits

  • serial (default queue), max nodes=1, max cpu's=12, run time limit 7 days
  • parallel, max nodes=12, max cpu's=144, run time limit 7 days
  • longrun, max nodes=12, max cpu's=144, run time limit 21 days
  • interactive, max nodes=1, max cpu's=12, run time limit 4 hours
  • gpu run time limit 24 hours
  • gpu6g, run time limit 12 hours