Skip to end of metadata
Go to start of metadata

you can find the slides to this tutorial here

This tutorial assumes you have a NYU HPC user account. If you don't have an account, you may apply for an account here.

Introduction to the Prince Cluster

In a Linux cluster there are hundreds of computing nodes inter-connected by high speed networks. Linux operating system runs on each of the nodes individually. The resources are shared among many users for their technical or scientific computing purposes. Slurm is a cluster software layer built on top of the inter-connected nodes, aiming at orchestrating the nodes' computing activities, so that the cluster could be viewed as a unified, enhanced and scalable computing system by its users. In NYU HPC clusters the users coming from many departments with various disciplines and subjects, with their own computing projects, impose on us very diverse requirements regarding hardware, software resources and processing parallelism. Users submit jobs, which compete for computing resources. The Slurm software system is a resource manager and a job scheduler, which is designed to allocate resources and schedule jobs. Slurm is an open source software, with a large user community, and has been installed on many top 500 supercomputers.

This tutorial assumes you have a NYU HPC account. If not, you may apply for an account here.

Also assumes you are comfortable with Linux command-line environment. To learn about Linux please read Tutorial 1.

Prince computing nodes

Nodes Cores/Node CPU Type Memory Available To Jobs (GB) GPU Cards/NodeNames
68 28 Intel(R) Broadwell @ 2.60GHz 125 c[01-17]-[01-04]
32 28 Intel(R) Broadwell @ 2.60GHz 250 c[18-25]-[01-04]
32 20 Intel(R) Haswell @ 2.60GHz 62 c[26-27]-[01-16]*
176 20 Intel(R) IvyBridge @ 3.00GHz 62 c[28-38]-[01-16]
48 20 Intel(R) IvyBridge @ 3.00GHz 188 c[39-41]-[01-16]
4 40 Intel(R) Skylake @2.40GHz 187.5 c42-[01-04]
12 40 Intel(R) Cascade Lake @2.50GHz 187 c[43-45]-[01-04]
4 20 Intel(R) Haswell @ 3.10GHz 500 c99-[01-04]
2 48 Intel(R) IvyBridge @ 3.00GHz 1510 c99-[05-06]
2 64 Intel(R) Xeon Phi @ 1.30GHz 186 (+ 16GB MCDRAM) c99-[07-08]
1 16 Intel(R) Skylake @ 3.50GHz 1500 c99-09
9 28 Intel(R) Broadwell @ 2.60GHz 250 4 Tesla K80 gpu-[01-09]
4 28 Intel(R) Broadwell @ 2.60GHz 125 4 GeForce GTX 1080 gpu-[10-13]
8 20 Intel(R) IvyBridge @ 2.50GHz 125 8 Tesla K80 gpu-[23-30]
24 28 Intel(R) Broadwell @ 2.60GHz 250 4 Tesla P40 gpu-[31-54]
8 28 Intel(R) Broadwell @ 2.60GHz 250 4 Tesla P100 gpu-[60-67]
6 40 Intel(R) Skylake @ 2.40GHz 376 4 Tesla V100 gpu-[68-73]
1 40 Intel(R) Skylake @ 2.40GHz 185 2 Tesla V100 gpu-90
* c[26-27]-[01-16] represents two sets of nodes: c26-01 to c26-16, and c27-01 to c27-16

File systems

Space Environment Variable Purpose Flushed Allocation/User
/archive $ARCHIVE Long-term storage NO 2 TB
/home $HOME Small files, code NO 20 GB
/beegfs $BEEGFS File staging - workflows with many small files YES. Files unused for 60 days are deleted 2 TB
/scratch $SCRATCH File staging - frequent writting and reading YES. Files unused for 60 days are deleted 5 TB

For more details of nodes, file system's hardware configuration, please click the link "Cluster - Prince".

Return to Top

The Prince picture

NOTE: The cluster nodes can still access the internet directly. This may be useful when copying data from servers outside the NYU Network

NOTE: Alternatively, instead of login to the bastion hosts, you can use VPN to get inside NYU's network and access the HPC clusters directly. Instructions on how to install and use the VPN client are available here

NOTE: You can't do anything on the bastion hosts, except ssh to the HPC clusters (Prince, Dumbo).

Return to Top 

Connecting to Prince

Logging onto the Prince cluster and submitting jobs is analogous to triple jump the Olympic game which was originated in ancient Greece. First, open a terminal on your Mac workstation. If your workstation is outside of NYU network, follow these three steps:

  1. Hop  - from your workstation, ssh onto one bastion host
  2. Step - from any bastion host, ssh to the Prince cluster login node
  3. Jump - from any login node, run command "sbatch" or "srun" to submit jobs which will land on the computing node(s)

If you are inside NYU network, the first step 'hop' could be omitted.
See for instance a complete HPC session:

ITSs-Air-3:~ johd$ ssh

           ~~~~~~~   ~~~~~~~~~~~~~~~~~~~~          ~~~~~~~~~~~~~~
 This computer system is operated by New York University (NYU) and may be
 accessed only by authorized users.  Authorized users are granted specific,
 limited privileges in their use of the system.  The data and programs
 in this system may not be accessed, copied, modified, or disclosed without
 prior approval of NYU.  Access and use, or causing access and use, of this
 computer system by anyone other than as permitted by NYU are strictly pro-
 hibited by NYU and by law and may subject an unauthorized user, including
 unauthorized employees, to criminal and civil penalties as well as NYU-
 initiated disciplinary proceedings.  The use of this system is routinely
 monitored and recorded, and anyone accessing this system consents to such
 monitoring and recording.  Questions regarding this access policy should be
 directed (by e-mail) to or (by phone) to 212-998-3333.
 Questions on other topics should be directed to COMMENT (by email) or to
 212-998-3333 by phone.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~'s password: 
Last login: Wed Jan 11 09:59:54 2017 from

[johd@hpc ~]$ ssh prince's password: 
Last login: Sat Jan 14 11:45:42 2017 from

[johd@log-1 ~]$ sbatch --wrap "hostname; echo 'hello, this is a test'"
Submitted batch job 9870

[johd@log-1 ~]$ exit

For access from Windows station using PuTTY, please read below.

 Click here to expand...

Step 1.

Enter "" for host name, and leave "22" the default for port. If you want, you may enter
a name for saved session e.g. "hpcgw", click "Save" for use next time. Hit "Open".

Click "Yes" when a Window as below showing up.

Step 2.

Enter your NetID username, and password. This will get you onto one of our bastion hosts.

Step 3.

On hpc bastion host, enter command "ssh" or "ssh prince" for short hostname, answer "yes" to the question
and type your NetID password. Suppose everything goes on smoothly, you will land on one prince login node!

Note: ssh tunnelling is not required for Slurm tutorial classroom exercises.

Connecting to Prince through ssh tunnel

Return to Top

Describing Slurm commands

Submit jobs - [sbatch]

Batch job submission can be accomplished with the command sbatch. Like in Torque qsub, we create a bash script to describe our job requirement: what resources we need, what softwares and processing we want to run, memory and CPU requested, and where to send job standard output and error etc. After a job is submitted, Slurm will find the suitable resources, schedule and drive the job execution, and report outcome back to the user. The user can then return to look at the output files.

In the first example, we create a small bash script, run it locally, then submit it as a job to Slurm using sbatch, and compare the results.  

$ mkdir -p /scratch/$USER/mytest1
$ cd /scratch/$USER/mytest1

$ cat >
sleep 20
$ chmod +x  
# This is just for demo purpose. Real work should be submitted 
# to Slurm to run on computing nodes. 
$ ./ 
Mon Feb  6 15:34:52 EST 2017
Mon Feb  6 15:35:12 EST 2017 

$ sbatch 
Submitted batch job 22140
$ cat slurm-22140.out 
Mon Feb  6 15:35:21 EST 2017
Mon Feb  6 15:35:41 EST 2017


Follow the recipe below to submit a job. The job can be used later as an example for practicing how to check job status. In my test its running duration is about 7 minutes.

$ cd /scratch/$USER/mytest1
$ cp /share/apps/Tutorials/slurm/example/run-matlab.s .
$ cp /share/apps/Tutorials/slurm/example/thtest.m .
$ sbatch run-matlab.s
Submitted batch job 11615

Below is the content of the bash script "run-matlab.s" just used in the job submission:

##SBATCH --nodes=1
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=2
#SBATCH --time=1:00:00
#SBATCH --mem=4GB
#SBATCH --job-name=myMatlabTest
#SBATCH --mail-type=END
#SBATCH --output=slurm_%j.out

module purge
module load matlab/2016b

cd /scratch/$USER/mytest1
cat thtest.m | srun matlab -nodisplay

For reference - Explanation of script

 Click here to expand...

The script is given in /share/apps/Tutorials/slurm/example. Below is an annotated version with detailed explanation of the SBATCH directives used in the script:

# This line tells the shell how to execute this script, and is unrelated 
# to SLURM.
# at the beginning of the script, lines beginning with "#SBATCH" are read by
# SLURM and used to set queueing options. You can comment out a SBATCH 
# directive with a second leading #, eg:
##SBATCH --nodes=1
# we need 1 node, will launch a maximum of one task. The task uses 2 CPU cores  
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=2
# we expect the job to finish within 1 hours. If it takes longer than 1
# hours, SLURM can kill it: 
#SBATCH --time=1:00:00
# we expect the job to use no more than 4GB of memory:
#SBATCH --mem=4GB
# we want the job to be named "myMatlabTest" rather than something generated 
# from the script name. This will affect the name of the job as reported
# by squeue: 
#SBATCH --job-name=myMatlabTest

# when the job ends, send me an email at this email address.
# replace with your email address, and uncomment that line if you really need to receive an email.
#SBATCH --mail-type=END
# both standard output and standard error are directed to the same file.
# It will be placed in the directory I submitted the job from and will
# have a name like slurm_12345.out
#SBATCH --output=slurm_%j.out
# once the first non-comment, non-SBATCH-directive line is encountered, SLURM 
# stops looking for SBATCH directives. The remainder of the script is  executed
# as a normal Unix shell script
# first we ensure a clean running environment:
module purge
# and load the module for the software we are using:
module load matlab/2016b
# the script will have started running in $HOME, so we need to move into the 
# directory we just created earlier
cd /scratch/$USER/mytest1
# now start the Matlab job:
cat thtest.m | srun matlab -nodisplay
# Leave a few empty lines in the end to avoid occasional EOF trouble.

The job has been submitted successfully. And as the example box showing, its job ID is 11615. Usually we should let the scheduler to decide on what nodes to run jobs. In cases there is a need to request a specific set of nodes, use the directive nodelist, e.g. '#SBATCH --nodelist=c09-01,c09-02'.

Return to Top

Check cluster status - [sinfo, squeue]

The sinfo command gives information about the cluster status, by default listing all the partitions. Partitions group computing nodes into logical sets, which serves various functionalities such as interactivity, visualization and batch processing.

A partition is a group of nodes. A partition can be made up of nodes with a specific feature/functionality, such as nodes equipped with GPU accelerators (gpu partition). A partition can have specific parameters, such as how long jobs can run. So partitions can be thought as "queues" in other batch systems. Partitions may overlap.

$ sinfo
c01_25*      up 1-00:00:00      4    mix c13-[01-04]
c01_25*      up 1-00:00:00     95   idle c01-[01-04],c02-[01-04],c03-[01-04],c04-[01-04],c05-[01-04],c06-[01-04],c07-[01-04],c08-[01-04],c09-[01-04],c10-[01-04],c11-[01-04],c12-[01-04],c14-[01-04],c15-[01-04],c16-[01-04],c17-[01-03],c18-[01-04],c19-[01-04],c20-[01-04],c21-[01-04],c22-[01-04],c23-[01-04],c24-[01-04],c25-[01-04]
c26          up 1-00:00:00     16   idle c26-[01-16]
c27          up 1-00:00:00     16   idle c27-[01-16]
gpu          up 1-00:00:00      2    mix gpu-[01-02]
gpu          up 1-00:00:00      7   idle gpu-[03-09]
sinfo by default prints information aggregated by partition and node state. As shown above, there are four partitions namely c01_25, c26, c27 and gpu. The partition marked with an asterisk is the default one. Except there are two lines with the node state 'mix', which means some CPU cores occupied, all other nodes are idle.

See two useful sinfo command examples: 1. the first one lists those nodes in idle state in the gpu partition; 2. the second outputs information in a node-oriented format.

$ sinfo -p gpu -t idle
gpu          up 1-00:00:00      5   idle gpu-[05-09]

$ sinfo -lNe
Mon Jan 16 15:05:49 2017
c01-01         1   c01_25*        idle   28   2:14:1 128826    61889      1   (null) none                
c01-02         1   c01_25*        idle   28   2:14:1 128826    61889      1   (null) none                
c01-03         1   c01_25*        idle   28   2:14:1 128826    61889      1   (null) none                
c01-04         1   c01_25*        idle   28   2:14:1 128826    61889      1   (null) none                
c02-01         1   c01_25*        idle   28   2:14:1 128826    61889      1   (null) none                
c02-02         1   c01_25*        idle   28   2:14:1 128826    61889      1   (null) none                
c02-03         1   c01_25*        idle   28   2:14:1 128826    61889      1   (null) none                
c02-04         1   c01_25*        idle   28   2:14:1 128826    61889      1   (null) none                
c03-01         1   c01_25*        idle   28   2:14:1 128826    61889      1   (null) none                
c03-02         1   c01_25*        idle   28   2:14:1 128826    61889      1   (null) none 

Return to Top

The squeue command lists jobs which are in a state of either running, or waiting or completing etc. It can also display jobs owned by a specific user or with specific job ID.

$ squeue
              9874    c01_25 model_ev     johd  R      17:00      4 c13-[01-04]
              9868       gpu relases-    xh814  R   17:45:45      1 gpu-01
              9869       gpu amberGPU    xh814  R    1:30:19      1 gpu-01
              9873       gpu  pemed_1     johd  R      17:08      1 gpu-02

$ squeue -u johd
              9874    c01_25 model_ev     johd  R      22:19      4 c13-[01-04]

$ squeue -j 9874 -o "%.18i %.9P %.8j %.8u %.8T %.10M %.9l %.6D %R %m"
              9874    c01_25 model_ev     johd  RUNNING      23:31   1:00:00      4 c13-[01-04] 2000M

Run 'man sinfo' or 'man squeue' to see the explanations for the results.

Return to Top

Check job status - [squeue, sstat, sacct]

With the job ID in hand, we can track the job status through its life time. The job first appears in the Slurm queue in the PENDING state. Then when its required resources become available, the job gets priority for its turn to run, and is allocated resources, the job will transit to the RUNNING state. Once the job runs to the end and completes successfully, it goes to the COMPLETED state; otherwise it would be in the FAILED state. Use squeue -j <jobID> to check a job status. 

$ squeue -j 9877
              9877       gpu  pemed_1     johd  R       0:10      1 gpu-02

Most of the columns in the output of the squeue command are self-explanatory. 

The column "ST" in the middle is the job status, which can be :

  • PD - pending: waiting for resource allocation
  • S  - suspended
  • R  - running
  • F  - failed: non-zero exit code or other failures
  • CD - completed: all processes terminated with zero exit code
  • CG - completing: in the completing process, some processes may still be alive

The column "NODELIST(REASON)" in the end is job status due to the reason(s), which can be :

  • JobHeldUser:            (obviously) 

  • Priority:               higher priority jobs exist
  • Resources:              waiting for resources to become available
  • BeginTime:              start time not reached yet
  • Dependency:             wait for a depended job to finish
  • QOSMaxCpuPerUserLimit:  number of CPU core limit reached

You may select what columns to display, in a width specified with an integer number between %. and a letter, %.10M.

$ squeue -j 9874 -o "%.18i %.9P %.8j %.8u %.8T %.10M %.9l %.6D %R %m"
              9874    c01_25 model_ev     johd  RUNNING      23:31   1:00:00      4 c13-[01-04] 2000M

Run the command sstat to display various information of running job/step. Run the command sacct to check accounting information of jobs and job steps in the Slurm log or database. There is a '–-helpformat' option in these two commands to help checking what output columns are available.

$ sstat -j 23221 -o JobID,NodeList,Pids,MaxRSS,AveRSS,MaxVMSize
       JobID             Nodelist                 Pids     MaxRSS     AveRSS  MaxVMSize 
------------ -------------------- -------------------- ---------- ---------- ---------- 
23221.0                    c03-04               158503   3462088K   2681124K   8357328K 
$ sacct -j 7050 --format JobID,jobname,NTasks,nodelist,MaxRSS,MaxVMSize,AveRSS,AveVMSize
       JobID    JobName   NTasks        NodeList     MaxRSS  MaxVMSize     AveRSS  AveVMSize 
------------ ---------- -------- --------------- ---------- ---------- ---------- ---------- 
7050         mpiexec-t+              c17-[01-03]                                             
7050.batch        batch        1          c17-01    149112K    208648K    149112K    113260K 
7050.extern      extern        3     c17-[01-03]          0      4316K          0      4316K 
7050.0            orted        2     c17-[02-03]    141016K    370880K    140024K    370868K

Type "man <command>" to look up detailed usage on the manual pages of command squeue, sstat and sacct.

Return to Top

Cancel a job - [scancel]

Things can go wrong, or in a way unexpected. Should you decide to terminate a job before it finishes, scancel is the tool to help.

$ squeue -j 9877
              9877       gpu  pemed_1     johd  R       9:04      1 gpu-02
$ scancel 9877

Checking job history - [sacct]

 If you need to check your history  of jobs, sacct is the command to use.

$ sacct --format=User,JobID,Jobname,partition,state,time,start,end,elapsed,AveCPU,MaxRss,MaxVMSize,nnodes,ncpus,nodelist\
 --starttime=2019-01-01 --endtime=2019-09-15 -u <your netid>

You should see the following output after running this piece of code:

Look at job results

Job results includes the job execution logs (standard output and error), and of course the output data files if any defined when submitting the job. Log files should be created in the working directory, and output data files in your specified directory. Examine log files with a text viewer or editor, to gain a rough idea of how the execution goes. Open output data files to see exactly what result is generated. Run sacct command to see resource usage statistics. Should you decide that the job needs to be rerun, submit it again with sbatch with a modified version of batch script and/or updated execution configuration. Iteration is one characteristic of a typical data analysis!

Return to Top

Software and Environment Modules

Environment Modules is a tool for managing multiple versions and configurations of software packages, and is used by many HPC centers around the world. With Environment Modules, software packages are installed away from the base system directories, and for each package an associated modulefile describes what must be altered in a user's shell environment - such as the $PATH environment variable - in order to use the software package. The modulefile also describes dependencies and conflicts between this software package and other packages and versions.

To use a given software package, you load the corresponding module. Unloading the module afterwards cleanly undoes the changes that loading the module made to your environment, thus freeing you to use other software packages that might have conflicted with the first one.

Working with software packages on the NYU HPC clusters.

Command Functionality
module avail check what software packages are available
module whatis module-name Find out more about a software package
module help module-name A module file may include more detailed help for the software package
module show module-name see exactly what effect loading the module will have with
module list check which modules are currently loaded in your environment
module load module-name load a module
module unload module-name unload a module
module purge remove all loaded modules from your environment

Return to Top 

Running interactive jobs 

Majority of the jobs on Prince cluster are submitted with the sbatch command, and executed in the background. These jobs' steps and workflows are predefined by users, and their executions are driven by the scheduler system.

There are cases when users need to run applications interactively (interactive jobs). Interactive jobs allow the users to enter commands and data on the command line (or in a graphical interface), providing an experience similar to working on a desktop or laptop. Examples of common interactive tasks are:

  • Editing files
  • Compiling and debugging code
  • Exploring data, to obtain a rough idea of characteristics on the topic
  • Getting graphical windows to run visualization
  • Running software tools in interactive sessions

Since the login nodes of the Prince cluster  are shared between many users, running interactive jobs that require significant computing and IO resources on the login nodes will impact many users.

Interactive jobs on Prince Login nodes

Running compute and IO intensive interactive jobs on the Prince login nodes is not allowed. Jobs may be removed without notice. 

Instead of running interactive jobs on Login nodes, users can run interactive jobs on Prince Compute nodes using SLURM's srun utility. Running interactive jobs on compute nodes does not impact many users and in addition provides access to resources that are not available on the login nodes, such as interactive access to GPUs, high memory, exclusive access to all the resources of a compute node, etc.  There is no partition on Prince that has been reserved for Interactive jobs.

Through srun SLURM provides rich command line options for users to request resources from the cluster, to allow interactive jobs. Please see some examples and short accompanying explanations in the code block below, which should cover many of the use cases.

# In the srun examples below, through "--pty /bin/bash" we request to start bash command shell session in pseudo terminal
# by default the resource allocated is single CPU core and 2GB memory for 1 hour
$ srun --pty /bin/bash
# To request 4 CPU cores, 4 GB memory, and 2 hour running duration
$ srun -c4 -t2:00:00 --mem=4000 --pty /bin/bash
# To request one GPU card, 3 GB memory, and 1.5 hour running duration
$ srun -t1:30:00 --mem=3000 --gres=gpu:1 --pty /bin/bash

In srun there is an option "–x11", which enables X forwarding, so programs using a GUI can be used during an interactive session (provided you have X forwarding to your workstation set up). If necessary please read the wiki pages on how to set up X forwarding for Windows  and Linux / Max  workstation. NOTE: X forwarding is not required for Slurm tutorial classroom exercises.

# To request computing resources, and export x11 display on allocated node(s)
$ srun --x11 -c4 -t2:00:00 --mem=4000 --pty /bin/bash
$ xterm  # check if xterm popping up okay
# To request GPU card etc, and export x11 display
$ srun --x11 -t1:30:00 --mem=3000 --gres=gpu:1 --pty /bin/bash

Return to Top

Running R batch Job

Long running and big data crunching jobs ought to be submitted as batch, so that they will run in the background and Slurm will drive their executions. Below are a R script "example.R", and a job script which can be used with sbatch command to send a job to Slurm:

$ cat run-R.sbatch 
#SBATCH --job-name=RTest
#SBATCH --nodes=1
#SBATCH --tasks-per-node=1
#SBATCH --mem=2GB
#SBATCH --time=01:00:00

module purge
module load r/intel/3.3.2

cd /scratch/$USER/examples
R --no-save -q -f example.R > example.out 2>&1
$ cat example.R
df <- data.frame(x=c(1,2,3,1), y=c(7,19,2,2))
indices <- order(df$x)

Run the job using "sbatch".

$ sbatch run-R.sbatch

Return to Top

R Interactive session

The following example shows how to work with Interactive R session on a compute node:

[sk6404@log-1 ~]$ srun -c 1 --pty /bin/bash
[sk6404@c17-01 ~]$ module purge
[sk6404@c17-01 ~]$ module list
No modules loaded
[sk6404@c17-01 ~]$ module load r/intel/3.3.2
[sk6404@c17-01 ~]$ module list
Currently Loaded Modules:
  1) jdk/1.8.0_111   2) intel/17.0.1   3) openmpi/intel/2.0.1   4) r/intel/3.3.2
[sk6404@c17-01 ~]$ R
R version 3.3.2 (2016-10-31) -- "Sincere Pumpkin Patch"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-centos-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> 5 + 10
[1] 15
> 6 ** 2
[1] 36
> tan(45)
[1] 1.619775
> q()
Save workspace image? [y/n/c]: n
[sk6404@c17-01 ~]$ exit
[sk6404@log-1 ~]$

Return to Top

Running GPU jobs

To request one GPU card, use SBATCH directives in job script:

#SBATCH --gres=gpu:1

To request a specific card type, use e. g. --gres=gpu:k80:1. The card types currently available are k80, p1080, p40, p100 and v100. As an example, let's submit an Amber job. Amber is a molecular dynamics software package. The recipe is:

$ mkdir -p /scratch/$USER/myambertest
$ cd /scratch/$USER/myambertest
$ cp /share/apps/Tutorials/slurm/example/amberGPU/* .
$ sbatch run-amber.s
Submitted batch job 14257 

From the tutorial example directory we copy over Amber input data files "inpcrd", "prmtop" and "mdin", and the job script file "run-amber.s". The content of the job script "run-amber.s" is:

#SBATCH --job-name=myAmberJobGPU
#SBATCH --nodes=1
#SBATCH --cpus-per-task=1
#SBATCH --time=00:30:00
#SBATCH --mem=3GB
#SBATCH --gres=gpu:1

module purge
module load amber/openmpi/intel/16.06

cd /scratch/$USER/myambertest
pmemd.cuda -O

The demo Amber job should take ~2 minutes to finish once it starts running. When the job is done, several output files are generated. Check the one named "mdout", which has a section most relevant here:


|--------------------- INFORMATION ----------------------
| GPU (CUDA) Version of PMEMD in use: NVIDIA GPU IN USE.
|                    Version 16.0.0
|                      02/25/2016

|------------------- GPU DEVICE INFO --------------------
|            CUDA_VISIBLE_DEVICES: 0
|   CUDA Capable Devices Detected:      1
|           CUDA Device ID in use:      0
|                CUDA Device Name: Tesla K80
|     CUDA Device Global Mem Size:  11439 MB
| CUDA Device Num Multiprocessors:     13
|           CUDA Device Core Freq:   0.82 GHz

Return to Top

Running array jobs

Using job array you may submit many similar jobs with almost identical job requirement. This reduces loads on both shoulders of users and the scheduler system. Job array can only be used in batch jobs. Usually the only requirement difference among jobs in a job array is the input file or files. Please follow the recipe below to try the example. There are 5 input files named 'sample-1.txt', 'sample-2.txt' to 'sample-5.txt' in sequential order. Running one command "sbatch --array=1-5 run-jobarray.s", you submit 5 jobs to process each of these input files individually.

$ mkdir -p /scratch/$USER/myjarraytest
$ cd /scratch/$USER/myjarraytest
$ cp /share/apps/Tutorials/slurm/example/jobarray/* .
$ ls
run-jobarray.s  sample-1.txt  sample-2.txt  sample-3.txt  sample-4.txt  sample-5.txt
$ sbatch --array=1-5 run-jobarray.s 
Submitted batch job 23240

The content of the job script 'run-jobarray.s' is copied below:

#SBATCH --job-name=myJobarrayTest
#SBATCH --nodes=1 --ntasks=1
#SBATCH --time=5:00
#SBATCH --mem=1GB
#SBATCH --output=wordcounts_%A_%a.out
#SBATCH --error=wordcounts_%A_%a.err

module purge
module load python/intel/2.7.12

cd /scratch/$USER/myjarraytest
python sample-$SLURM_ARRAY_TASK_ID.txt

Job array submission induces an environment variable SLURM_ARRAY_TASK_ID, which is unique for each job array job. It is usually embedded somewhere so that at a job running time its unique value is incorporated into producing a proper file name. Also as shown above: two additional options %A and %a, denoting the job ID and the task ID (i.e. job array index) respectively, are available for specifying a job's stdout, and stderr file names.

Return to Top


SLURM_* environment variables

To get the list of SLURM_* variables, you may run a job to check, e.g. srun sh -c 'env | grep SLURM | sort' . The command 'man sbatch' explains what these variables stand for. Below are a few frequently used:

  • SLURM_JOB_ID             -  the job ID
  • SLURM_SUBMIT_DIR         -  the job submission directory
  • SLURM_SUBMIT_HOST        -  name of the host from which the job was submitted
  • SLURM_JOB_NODELIST       -  names of nodes allocated to the job
  • SLURM_ARRAY_TASK_ID      -  job array job index
  • SLURM_JOB_CPUS_PER_NODE  -  CPU cores on this node allocated to the job
  • SLURM_NNODES             -  number of nodes allocated to the job

Excceded step memory limit at some point

The current Slurm implementation utilizes Linux Control Groups (cgroups) for resource containment. If necessary please see this page at for a detailed description of cgroups.

If you get the correct outputs, please just ignore this warning message - "slurmstepd: error: Exceeded job memory limit at some point". You can also check job exit state to confirm. For reference there is some explanation in the bug report. 

Return to Top