Slurm Plugin

Workbench | Advanced

The Slurm Job Launcher Plugin provides the capability to launch executables on a Slurm cluster.

Configuration

/etc/rstudio/launcher.slurm.conf

Config Option Description Required (Y/N) Default Value
server-user User to run the executable as. The plugin should be started as root and will lower its privilege to this user for normal execution. It is recommended not to change the default value, as this is populated by the Launcher service itself. N rstudio-server
thread-pool-size Size of the thread pool used by the plugin. It is recommended not to change the default value, as this is populated by the Launcher service itself. N Number of CPUs * 2
enable-debug-logging Enables/disables verbose debug logging. Can be 1 (enabled) or 0 (disabled). N 0
scratch-path Scratch directory where the plugin writes temporary state. N /var/lib/rstudio-launcher
logging-dir Specifies the path where debug logs should be written. N /var/log/rstudio/launcher
job-expiry-hours Number of hours before completed jobs are removed from the system. N 24
profile-config Path to the user and group profiles configuration file (explained in more detail below). N /etc/rstudio/launcher.slurm.profiles.conf
slurm-service-user The user to run Slurm service commands as. This user must have privileges to query the Slurm cluster. Y
slurm-bin-path The installation location of the Slurm command line utilities (e.g. sbatch,scontrol). If left blank, the command line utilities must be available on the default path. N ""
user-storage-path The default location to store Slurm job output. Can be templated with {HOME} or {USER}. Users must have write access to the configured location. Paths beginning with ~ will be correctly evaluated. N ~/slurm-data
max-output-stream-seconds The maximum amount of time to keep the job output stream open after the job is completed, in seconds. Since job output may be buffered, the output stream will stay open until it sees an end of stream notifier or it waits the configured number of seconds. Setting this option to a low value may cause job output to appear truncated. Reloading the job output window should resolve that. A value of 0 will cause the output stream to close immediately when the job finishes. N 30
max-output-file-wait-seconds The maximum amount of time to wait for the output files to be created after the output stream is started, in seconds. This can be useful if job output is being buffered for a long period of time, or if the file system is particularly slow. Setting this to a value that is too low may cause job-output-not-found errors when attempting to retrieve output shortly after the job starts. N 30
rsandbox-path Location of the rsandbox executable. N /usr/lib/rstudio-server/bin/rsandbox
unprivileged Runs the Launcher in unprivileged mode. Child processes will not require root permissions. If the plugin cannot acquire root permissions it will run without root and will not change users or perform any impersonation. N 0
enable-gpus Whether to allow users to request GPU resources when submitting jobs. The types of GPUs available for request can be controlled via the gpu-types option. N 0
enable-gres Whether to allow users to request arbitrary GRES resources when submitting jobs. The value of this field will be passed directly to the --gres option provided by sbatch. N 0
gpu-types A comma-separated list of GPU types that are available. If GPUs are enabled and this field is empty, users will be able to request general GPUs (i.e. sbatch --gpus=<n>. Otherwise users will be able to request GPUs for each type (i.e. sbatch --gpus=<type>:<n>[,<type>:<n>]). N
allow-requeue Whether to allow jobs submitted through the Slurm Launcher Plugin to be re-queued by Slurm. This option requires special consideration when using with Workbench Launcher Sessions. See Using the Slurm Launcher plugin with Workbench for more details. N 0
persist-env-vars Whether to persist environment variables submitted on the job as Job metadata to allow the data to be persisted across restarts of the Launcher and made available in load balanced scenarios. This should be disabled if you allow Slurm users to view other users’ job data to prevent potentially sensitive information from leaking. See Job Isolation and Security for more details. N 1

User storage directory

If the user-storage-path is left as the default value (~/slurm-data) or the {HOME} variable is used in the definition, the server-user must have read access to each user’s home directory in order to look up the real value. The Slurm Launcher Plugin will then attempt to automatically create the configured user-storage-path. This action will be taken as the user starting the Job, so the server-user does not need write access to the user’s home directory.

Job isolation and security

The Slurm Launcher prevents users from seeing details about other users’ jobs when interacting with the Launcher. However, depending on your Slurm configuration, user jobs may be visible to other users by default when using Slurm commands like sacct directly. Some job fields, such as environment variables, could contain sensitive information, so it is recommended that you prevent Slurm users from viewing job data for other users.

This is configurable by setting the PrivateData configuration option within slurmdbd.conf. For more information, see the Slurm documentation.

User and group profiles

The Slurm plugin also allows you to specify user and group configuration profiles, similar to Posit Workbench’s profiles, in the configuration file /etc/rstudio/launcher.slurm.profiles.conf (or an arbitrary file as specified in profile-config within the main configuration; see above). These are entirely optional.

Profiles are divided into sections of three different types:

  • Global ([*])

  • Per-group ([@groupname])

  • Per-user ([username])

Here is an example profiles file that illustrates each of these types:

/etc/rstudio/launcher.slurm.profiles.conf
[*]
default-cpus=1
default-mem-mb=512
max-cpus=2
max-mem-mb=1024
max-gpus-v100=1
max-gpus-tesla=1
allowed-partitions=mars,jupiter

[@posit-power-users]
default-cpus=4
default-mem-mb=4096
max-cpus=20
max-mem-mb=20480
max-gpus-tesla=10
default-gpus-tesla=2
max-gpus-v100=8
default-gpus-v100=0
allowed-partitions=mars,jupiter,saturn,titan

[jsmith]
max-cpus=3

This configuration specifies that by default users will be allowed to launch jobs with a maximum of 1024 MB of memory and 2 CPUs to either the mars or jupiter partitions. It also specifies that members of the posit-power-users group will be allowed to use much more resources and additional partitions.

Note that the profiles file is processed from top to bottom (i.e. settings matching the current user that occur later in the file always override ones that appeared prior). The settings available in the file are described in more depth in the table below. Also note that if the Slurm cluster has been configured to have a maximum and/or default memory value, these values will be returned whenever a maximum or default value is not configured for a user.

/etc/rstudio/launcher.slurm.profiles.conf

Config Option Description Required (Y/N) Default Value
default-cpus Number of CPUs available to a job by default if not specified by the job. N 0.0 (infinite - managed by Slurm)
default-mem-mb Number of MB of RAM available to a job by default if not specified by the job. N 0.0 (infinite - managed by Slurm)
max-cpus Maximum number of CPUs available to a job. Setting this to a negative value will disable setting CPUs on a job. If set, the value of default-cpus will always be used. N 0.0 (infinite - managed by Slurm)
max-mem-mb Maximum number of MB of RAM available to a job. Setting this to a negative value will disable setting memory on a job. If set, the value of default-mem-mb will always be used. N 0.0 (infinite - managed by Slurm)
max-gpus Only valid if enable-gpus is set to 1 and gpu-types is empty in launcher.slurm.conf. Maximum number of GPUs that can be requested per job. N 0.0 (infinite - managed by Slurm)
default-gpus Only valid if enable-gpus is set to 1 and gpu-types is empty in launcher.slurm.conf. Number of GPUs available to a job by default if not specified by the job. N 0.0 (infinite - managed by Slurm)
max-gpus-<type> Only valid if enable-gpus is set to 1 and type is included in the gpu-types field in launcher.slurm.conf. Maximum number of GPUs of the specified type that can be requested per job. N 0.0 (infinite - managed by Slurm)
default-gpus-<type> Only valid if enable-gpus is set to 1 and type is included in the gpu-types field in launcher.slurm.conf. Number of GPUs of the specified type available to a job by default if not specified by the job. N 0.0 (infinite - managed by Slurm)
allowed-partitions A comma-separated list of partitions that may be used to launch jobs. The partition(s) must be valid partitions as seen by your Slurm cluster from the sinfo command. N (empty - all partitions may be used)
default-partition The default partition for jobs. N The cluster-wide default
resource-profiles Available resource profiles. See Resource profiles. N
allow-custom-resources Whether jobs can use the custom resource profile. See Resource profiles. N 1
singularity-image-directory Path to a directory of containers in the Singularity Image File (SIF) format. When set, the plugin will allow users to launch Apptainer or Singularity containers with these images. This directory should be on shared storage accessible to all Slurm nodes (e.g. /opt/apptainer/containers/). N “”
default-singularity-image Default singularity image name. N default.simg

Resource profiles

Resource profiles greatly simplify the task of assigning CPU, memory, or GPU resources for a job (provided that GPUs are available). They are configured in the optional /etc/rstudio/launcher.slurm.resources.conf file. For example:

/etc/rstudio/launcher.slurm.resources.conf

[default]
name = "Default" # optional, derived from the section name when absent
cpus=1
mem-mb=4096

[small]
cpus=1
mem-mb=512

[default-gpu]
cpus=1
mem-mb=4096
nvidia-gpus=1
amd-gpus=0

[hugemem]
name = "Huge Memory"
cpus=8
mem-mb=262144

By default, all profiles are available to all users, and jobs can also use a special custom profile to specify CPU, memory, and GPU resources directly instead. However, users are still subject to the constraints in User and group profiles, and administrators may also limit access to individual resource profiles with that configuration file.

For example, suppose an admin wants to restrict the resource profiles above such that (1) GPUs and large memory jobs are only available to users in the bioinformatics group; and (2) only users in the posit-power-users group can use the custom resource profile to set their own resources directly. This might result in the following /etc/rstudio/launcher.slurm.profiles.conf file:

/etc/rstudio/launcher.slurm.profiles.conf

[*]
resource-profiles=default,small
allow-custom-resources=0

[@bioinformatics]
resource-profiles=default,small,default-gpu,hugemem

[@posit-power-users]
resource-profiles=default,small,default-gpu,hugemem
allow-custom-resources=1

The settings available in each section of the /etc/rstudio/launcher.slurm.resources.conf file are described in more depth in the table below:

/etc/rstudio/launcher.slurm.resources.conf

Config Option Description Required (Y/N) Default Value
name A user-friendly name for the profile, e.g. Default (1 CPU, 4G mem) or m4.xlarge. N The section title
cpus The CPU limit. Y
mem-mb The memory limit, in megabytes. Y
gpus Only valid if enable-gpus is set to 1 and gpu-types is empty in launcher.slurm.conf. Number of GPUs. N 0
<type>-gpus Only valid if enable-gpus is set to 1 and type is included in the gpu-types field in launcher.slurm.conf. Number of GPUs of the specified type. N 0
partition The specific partition (queue) to use, if any. N

Slurm cluster requirements

In order for the Slurm plugin to run correctly, the following assumptions about the Slurm cluster must be true:

  • The Slurm service account (specified in the main configuration file) must have full cluster-admin privileges.
  • The Slurm control machine (the one running slurmctld), the Workbench server host, and all Slurm nodes must have a shared home directory.
  • The Workbench server host must have the following properties:
    • the Slurm executables installed (e.g. sinfo, scontrol, etc.). See Supported Slurm versions for supported versions. You may experience unexpected behavior with unsupported versions.
    • the same slurm.conf file as the desired Slurm cluster
    • network connectivity to the machine running slurmctld (i.e. the Workbench server host can resolve the IP or hostname of the Slurm control machine and connect via the slurmctld port configured in slurm.conf)
    • properly configured and running Slurm plugins, as required (e.g. if using MUNGE as an authentication service, munged must be running under the same user on all machines connected to the Slurm cluster)
    • properly configured users and groups (i.e. all users with the same name have the same UID, group, group ID on all machines connected to the cluster)

For more information about configuring and running a Slurm cluster, please see the Slurm documenation. Information about available Slurm plugins can also be found in the Slurm documentation in the relevant section. For example, here is the documentation about Slurm Accounting which also includes information about the available plugins and how to use them.

Below is an example of a launcher configuration which might be used in this scenario:

/etc/rstudio/launcher.conf
[server]
address=127.0.0.1
port=5559
server-user=rstudio-server
admin-group=rstudio-server
enable-debug-logging=1

[cluster]
name=Slurm
type=Slurm
/etc/rstudio/launcher.slurm.conf
slurm-service-user=slurm
job-expiry-hours=48
user-storage-path=~/slurm-data
max-output-stream-seconds=15
slurm-bin-path=/slurm/bin
enable-gpus=1
gpu-types=tesla,v100

Supported Slurm versions

Slurm Version Notes Certified (Y/N)
23.02 Y
22.05 Y
21.08 Y
20.11 Y
19.05 Slurm 19.05 no longer recommended due to CVE-2021-31215. N

Using the Slurm Launcher plugin with Posit Workbench

To support launching Workbench Sessions via the Slurm Launcher plugin, the following must be true in addition to the requirements listed in the Slurm cluster requirements section:

  • The Workbench host must have network access to every Slurm node that may run a session via any TCP port
  • Slurm nodes must have network access to the Workbench host via the launcher-sessions-callback-address, as described in Launcher Configuration
  • To incorporate RStudio Pro Session configurations, rsession.conf must be accessible by all Slurm nodes that may run RStudio Pro Sessions. The default expected location can be changed by adding rsession-config-file=<path/to/rsession.conf> to /etc/rstudio/rserver.conf

If the allow-requeue option in launcher.slurm.conf is enabled (i.e. allow-requeue=1) and Workbench sessions may be preempted by higher priority jobs, it is advisable to set the Slurm preemption mode to SUSPEND rather than REQUEUE to avoid any loss of data in the Session. For more details, please see the Slurm Preemption Documentation.

Singularity with Posit Workbench Requirements

Preview feature

This feature is in preview. Preview features are unsupported and may face breaking changes in a future release. Any issues found in the feature will be addressed during the regular release schedule; they will not result in immediate patches or hotfixes.

We encourage customers to try these features and we welcome any feedback via Posit Support, but we recommend that the feature not be used in production until it is in general availability (i.e., officially released as a full feature). To provide feedback, please email your Posit Customer Success representative or and specify that you are trialing this feature.

To use the Slurm launcher plugin with singularity support the following must be true in addition to the requirements listed in Using the Slurm launcher plugin with Posit Workbench:

  • The singularity-image-directory must be on shared storage mounted on both the Workbench server node and all Slurm compute nodes.
  • All Singularity images must have session components that match the Workbench server version. See installation instructions here.
  • The Workbench server has launcher-sessions-create-container-user=0 in the rserver.conf file.

Multiple versions of R and module loading

As described in the R versions section, it is possible to use multiple versions of R and load environment modules per R Version with RStudio Pro Sessions launched via the Slurm Launcher Plugin by configuring the /etc/rstudio/r-versions file. In order to properly support this feature the following must be true:

  • R must be installed on all Slurm nodes in the same location.
  • The modules in question must be installed on all Slurm nodes.
  • The file /var/lib/rstudio-server/r-versions must be reachable by all Slurm nodes. Note that this file is generated by Workbench, and that its location may be changed by setting r-versions-path=<shared directory>/r-versions in rserver.conf.

Load balancing considerations

When using the Slurm Launcher Plugin with a load balanced Workbench, it is recommended to configure the Slurm cluster and Slurm Launcher Plugins so that the values for job-expiry-hours are the same in all copies of launcher.slurm.conf and the value for MinJobAge in slurm.conf is at least as long as the configured job-expiry-hours value. Note that MinJobAge is set in seconds, rather than hours.

Additional considerations

This section lists notable considerations related to the use of the Slurm Plugin.

  • Slurm does not provide a time zone for any time values which it returns. All times related to Slurm jobs returned from the launcher will have the same time zone as the configured Slurm cluster.