Login node jobs
On a login node (murska.csc.fi) a user can run only small test programs without the scheduler (LSF-SLURM). For example, a small serial and parallel programs can be run, respectively, by:
./my_serial_test_executable
/opt/hpmpi/bin/mpirun -np 2 ./my_parallel_mpi_test_executable
Some programs might need a purpose-built environment, i.e., a module file. Typically, a module file contains instructions that alter or set shell environment variables, such as PATH to enable access to various installed software or libraries.
module list
All the available modulefiles can be seen
module avail
If the program needs access to specific shared libraries that are defined in a modulefile load the modulefile before running the executable. You do not have to do this if the modulefile is already loaded by yourself or by default. Below is an example
module load modulefilename
module load mpi (this example loads the default mpi environment)
Compute node jobs
On the login node a job is submitted to LSF-SLURM, which places the job in a queue and allows it to run when the necessary resources become available on compute nodes.Very shortly LSF allocates the resources and SLURM provides an execution layer to launch tasks to all nodes in the allocation.
IMPORTANT: All files needed by a job must be copied to $WRKDIR, for example the program and input/output files. Remember to give module load modulefilename commands if needed. The limits (max number of cores, max runtime) of the interactive sessions can be found from bottom of this page.
Interactive serial job without GUI (Grafical User Interface).
bsub -n 1 -W 02:00 -Ip $SHELL -i (allocates the resources, LSF)
srun ./my_serial_executable (launch the job, SLURM)
exit (exit the allocation)
Options:
-n number of processes (number of cores)
-W running time, wallclock, format hh:mm (hours:minutes)
-Ip interactive job
SEE ALSO
man bsub, man srun
The command bsub will give a prompt for a command line only if the parameter -i is passed to the shell. However, some advanced control keys might not work. You may try using xterm instead of $SHELL (see below).
Interactive Non-MPI parallel job without GUI
bsub -n 4 -M 1048576 -W 01:30 -Ip $SHELL -i
srun ./my_executable
exit
Options:
-n number of processes (number of cores)
-W running time, wallclock, format hh:mm (hours:minutes)
-Ip interactive job
-M per process memory limit (KB) (example 1GB = 1048576 KB)
Other way (LSF and SLURM on the same command line):
bsub -n 4 -M 1048576 -W 01:30 -Ip srun ./my_executable
Interactive MPI-parallel job without GUI
bsub -n 4 -M 1048576 -W 00:30 -Ip $SHELL -i
mpirun -srun ./my_MPI_executable
exit
bsub options:
-n number of proceses (number of cores)
-W running time, wallclock, format hh:mm (hours:minutes)
-Ip interactive job
-M per process memory limit (KB)
Other way (LSF and SLURM on the same command line):
bsub -n 4 -M 1048576 -W 00:30 -Ip mpirun -srun ./my_MPI_executable
Serial or parallel interactive session where graphical user interface (GUI) is necessary.
Serial session (one core session). Remember to write all necessary bsub options (memory and runtime requirements).
bsub -Ip xtermMultiple core session (below 4 cores). Remember to write all necessary bsub options (number of cores, memory and runtime requirements).
bsub -n 4 -Ip xtermThese will open an X-terminal window where one can launch a serial or parallel application and where the prompt also support all control characters. After the xterm session has started, command can be entered normally:
srun ./my_serial_executable
mpirun -srun ./my_MPI_executable
How-to submit a serial or parallel batch job.
bsub < my_job_scriptAll files needed by a job are in $WRKDIR, for example the program and input/output files.
Do not forget '<' , it is essential.
$WRKDIR is available on all nodes and always means the same thing.
Serial batch job.
#!/bin/csh
###
### serial job script example
###
# execution shell environment
#BSUB -L /bin/csh
## name of your job, %J will show as your jobID
#BSUB -J my_jobname%J
## system error message output file
#BSUB -e my_output_err_%J
## system message output file
#BSUB -o my_output_%J
## send email notification when the job is finished
#BSUB -N
## a per-process (soft) memory limit
## limit is specified in KB
## example: 1 GB is 1048576
#BSUB -M 524288
## how long a job takes, wallclock time hh:mm
#BSUB -W 01:01
## number of proceses
#BSUB -n 1
## run my executable
srun my_serial_program
## bjobs will save some information about my job
bjobs -l $LSB_JOBID
Parallel batch job.
#!/bin/csh
###
### parallel job script example
###
# Initializes the execution environment
#BSUB -L /bin/csh
## name of your job, %J will show as your jobID
#BSUB -J my_jobname%J
## system error message output file
#BSUB -e my_output_err_%J
## system message output file
#BSUB -o my_output_%J
## send email notification when the job is finished
#BSUB -N
## a per-process (soft) memory limit
## limit is specified in KB
## example: 1 GB is 1048576
#BSUB -M 1048576
## how long a job takes, wallclock time hh:mm
#BSUB -W 11:01
##the number of processes (number of cores)
#BSUB -n 4
## run my MPI executable
/opt/hpmpi/bin/mpirun -srun my_mpi_program
## bjobs will save some information about my job
bjobs -l $LSB_JOBID
Remember to include module load modulefilename commands in a script if needed. By loading all needed modulefiles makes sure that the environment is always a correct one. Note that since July 2008 the modules environment is initialized automatically for all shells.
Available queues
The command bqueues displays available queues and some of their proporties. These may change from time to time. The following queues were available for customers when this chapter was written:
serial : 1 core / 4h/7d def/max runtime / not interactive
parallel : 256 cores / 4h/2d def/max runtime / not interactive
interactive : 32 cores / 1h/4h def/max runtime / interactive
longrun : 128 cores / 8h/21d def/max runtime / not interactive
NB! In the longrun queue you run at your own risk. If a batch job in that queue stops prematurely no compensation is given for lost cpu time!