Vuori User's Guide > Using the Vuori cluster > Running GPU programs
Tehdyt toimenpiteet

Running GPU programs

This section gives a short introduction on how to run programs with GPU accelerators on Vuori.

General information

There are 7 compute nodes that are targeted for GPU (Graphics Processing Unit) accelerated jobs. Two overlapping partitions/queues are available. The partition gpu has all 7 nodes and the partition gpu6g has 2 nodes. Each node has two GPU-accelerators and it is important to allocate just one GPU if the code can not exploit more. The accelerators are CUDA enabled NVIDIA GPUs. The accelerator hardware on the gpu6g partition is NVIDIA Tesla M2070 with 6 Gb of memory (5.25 Gb available for the user). Rest of the nodes have NVDIA Tesla M2050 accelerators with 3 Gb memory. In addition to the 7 nodes, there is one, g8, that is not included in the partitions. It is accessible using ssh and it is reserved for debugging/profiling or other interactive development usage.


Interactive usage

As mentioned in the general information part, node g8 is reserved for interactive use. Users can log in to g8 from vuori login nodes using ssh:

vuori> ssh g8
[your password]
g8> module load cuda

Note that you can not access your $HOME directory from g8 so you have to copy all needed files to your $WRKDIR (see Storing and moving files for more information).



GPU accelerated serial jobs

Before running a job load the CUDA environment

module load cuda

GPU resources are allocated at job submit time using the --gres option. Both salloc and srun commands must have this option.  The option requires an argument which resources are required (gpu) and how many (default value 1) per node. Because the nodes have 2 accelerators per node the maximum number of gpu resources per node is also 2.


Examples

Allocate one GPU and CPU core from the partition gpu and run my job (run time one hour).

salloc -t 1:00:00 -p gpu --gres=gpu:1 srun --gres=gpu:1 ./my_gpu_application

Allocate two GPUs and one CPU core from the partition gpu6g and run my job (run time 3 hours).

salloc -t 3:00:00 -p gpu6g --gres=gpu:2 srun --gres=gpu:2 ./my_multi_gpu_application

The development node access (for interactive development usage), ssh login from vuori login node:

ssh g8
[your password]
module load cuda

You can also start an interactive shell on the computing nodes using srun:

srun -t 1:00:00 -p gpu -n 1 --gres=gpu:1 --pty $SHELL

Batch job script example

#!/bin/csh
###
### GPU accelerated batch job example
###

## name of your job
#SBATCH -J my_GPUjob

## system error message output file
#SBATCH -e my_GPUjob_err_%j

## system message output file
#SBATCH -o my_GPUjob_out_%j

## how long a job takes, wallclock time hh:mm:ss
#SBATCH -t 01:01:00

## number of CPU cores
#SBATCH -n 1

## partition (gpu or gpu6g)
#SBATCH -p gpu

## how many GPUs per node
#SBATCH --gres=gpu:1

module load cuda

## run my GPU accelerated executable
## IMPORTANT: give the --gres option ALSO here
srun --gres=gpu:1 ./my_gpu_executable

GPU accelerated MPI jobs

Settings and parameters for GPU accelerated MPI jobs depend on the programming model. We present here only the settings for jobs that use a single MPI process per GPU. For other cases one should change the resource allocations accordingly. One should note that the --gres option allocates the GPU resources on the node level and all MPI processes of a node can access all allocated GPU resources of that particular node. In addition, several processes can attach to a single GPU. So one has to set up the MPI processes carefully in order to utilize the resources efficiently. For the one-process-one-GPU model, one has to undersubscribe the nodes using --ntasks-per-node=2 option (see also the Running programs section).

Examples

Two processes in one node, use both GPUs:
salloc -t 3:00:00 -p gpu -n 2 --gres=gpu:2 srun --gres=gpu:2 ./test

Eight processes and eight GPUs, four nodes. Here we use the --ntasks-per-node option to distribute two processes for each node:

salloc -t 1:00:00 -p gpu -n 8 --ntasks-per-node=2 --gres=gpu:2 srun --gres=gpu:2 ./test

Batch job script example:

#!/bin/csh
###
### GPU accelerated MPI job example, launch four processes in two nodes (four gpus in total)
###

## name of your job
#SBATCH -J simple_MPI_GPUjob

## system error message output file
#SBATCH -e simple_MPI_GPUjob_err_%j

## system message output file
#SBATCH -o simple_MPI_GPUjob_out_%j

## how long a job takes, wallclock time hh:mm:ss
#SBATCH -t 00:30:00

## number of CPU cores
#SBATCH -n 4

## partition (gpu or gpu6g)
#SBATCH -p gpu

## how many GPUs per node
#SBATCH --gres=gpu:2

## undersubscribe the nodes, only two processes per node (one for each GPU)
#SBATCH --ntasks-per-node=2

module load cuda

## run my GPU accelerated MPI executable
## IMPORTANT: give the --gres option ALSO here
srun --gres=gpu:2 ./simpleMPI